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THE COEFFICIENT OF EQUIPROPORTION AS A 
CRITERION OF HIERARCHY’ 


STUART CARTER DODD 


Fellow of the National Research Council, U. S. A. and of the Rockefeller Foun- 
dation, University College, London 


The term “equiproportion” is proposed? to define exactly the 
looser concept of “‘hierarchy.”’ Equiproportion is defined by equation 


T/Tev = Tusk Tue (1) 


or its ‘‘tetrad difference’’ form: 
TriTus — ToT = O (la) 


A table of intercorrelations is equiproportional when all its tetrad 
differences are zero, within limits of the probable error of sampling. 
This more exact terminology seems desirable for hierarchy often 
meant subjective estimation of the tendency of the coefficients in the 
rows and columns to decrease; or else it meant the particular and 
approximative criterion of the intercolumnar correlation equalling 
unity. It further connoted a ranking one above another, which was 
misleading since all the intercorrelations may be equal or zero and still 
equiproportion be perfect. 

In practice, tables which are strongly equiproportional will never 
be quite perfectly so due to (a) errors of sampling or (b) the presence of 
group* factors over and above the general and then specific factors into 
which the n equiproportional variables can always be analyzed. 


1 Indebtedness is acknowledged to the criticisms of Professor C. Spearman and 
Professor G. H. Thomson in preparing this study. 

2 The proposal is made jointly with Prof. C. Spearman. 

* These group factors must be distinguished from the dice-like group factors 
dealt with by Professor G. H. Thomson, else the present confusion will be increased. 
For an exposition of the different properties of the different types of factors see 
Dodd, S. C.: A Review of the Theory of Factors (shortly to appear in Psychological 
Review). Garnett has shown that the dice-like group factors when arranged 
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The object of this paper is to develop the best criterion of equipro- 
portion in practice. It must measure and interpret both the absolute 
amount of the divergence from perfect equiproportion (presumably 
due to group factors) and the probability that this amount is, or is not, 
due to sampling error. 

As an approximative criterion the intercolumnar correlation equal- 
ling unity was first developed because it enabled estimating the 
sampling error. But this proved unsatisfactory because: 

(a) The intercolumnar r might be unity although (1) was not 
satisfied, as instanced by such coefficients as .4, .5, .6 being matched 
with .3, .5, .7 in the next column. 

(b) When the r’s in a column tend to be equal, the intercolumnar 
r tends to become an indeterminate quantity since all deviations are 
shrinking towards zero. Before this point is reached, however, those 
deviations become smaller than and therefore swamped by, the prob- 
able error of the r’s. 

(c) To avoid this a “correctional standard” was required which 
rejected columns whose deviations were not significant. This meant 
that this criterion of hierarchy could then be only applied to a part of 
the data. 

This intercolumnar criterion was replaced by the tetrad difference 
criterion! when its probable error was worked out by Spearman and 
Holzinger. The full formula for the probable error of the fundamental 
criterion of a single tetrad was found and several briefer approxima- 
tions to it. - 





according to the laws of probability, are not inconsistent with, but only a variant 
mathematical form of expressing the general and specific factors of the two factor 
theory. Otherwise expressed, the dice-like group factors are only mathematical 
functions of the general and specific factors, while conversely the latter are only 
mathematical functions of the former. Either can be expressed in terms of the 


: R 1 ; 
other. The conversion formula is g = wa +e +:+++ + €,), where g is 
n 


the general factor of the two factor theory and the e’s are the dice-like elements 
composing Thomson’s group factors. Garnett, J. C. M.: The Single General 
Factor in Dissimilar Mental Measurements, British Journal of Psychology, Vol. 
X, 1920. 

The group factors referred to in this paper may be defined as that part of the 
observed variables which produces residual intercorrelation after the general 
factor has been partialled out, as described later. 


1 For an exposition of this criterion in detail together with the exact and the 
approximative probable error formule see Spearman, C.: “Abilities of Man.” 
Macmillan Company, 1927, pp. 415, Appendix II. 
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To summarize the table a formula for a general probable error was 
worked out which depends upon the mean and the standard deviation 
of all the correlation coefficients in the table. The proof of this 
formula has not yet been published and the author states that it 
‘in some theoretical points requires elucidation still.’ This present 
criterion of equiproportion is the significance of the median tetrad 
difference as compared with this probable error from the whole table. 
The ratio of such quantity to its probable error may be called the 
significance ratio—a term of convenience in interpreting many types 
of statistical data. 

A significance ratio of unity indicates a tetrad difference which 
would as often as not occur by mere sampling and which, therefore, 
needs no further explanation. If the significance ratio is larger than 
5, it is currently considered established that, with quite high probabil- 
ity, the quantity is not due to sampling error. In the case of equipro- 
portion this means that the observed median tetrad differences 
departure from zero by a significance ratio of 5 or more, definitely 
indicates group factors preventing the clean analysis of the variables 
into one general and n specific factors. 

This generalized tetrad difference criterion leaves room for improve- 
ment (aside from the fact that its proof is incomplete, though prom- 
ised) because of the following features: 

(a) It does not satisfactorily interpret the absolute amount of the 
divergence from perfect equiproportion, but rather performs the 
second function mentioned above, namely that of giving the probability 
that this amount may be due to sampling error. 

(b) It is very laborious to compute for a large table. There are 
3n°4 tetrads in a table of n variables. This is 3003 for n = 14. 

(c) It does not clearly isolate the group factors which may be 
present between each pair of variables, and measure their size. 

Now it has been shown by Garnett! as a special case of the cosine 
law, that when equiproportion exists, the intercorrelation of every 
variable in the table can be expressed as the product of the correlation 
of those variables with the general factor, g, thus: 


Toy = Vz yo (2) 


This is also derivable* from the partial correlation formula which is 





1 Garnett, J. C. M.: Proceedings Royal Society, A96, 1919, pp. 91-110. 


*Spearman, C. and Hart, B.: Mental Tests of Dementia. Psychological 
Review, 1914. 
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zero upon the elimination of the general factor as the residual specific 
factors are uncorrelated: 





Tey — Teel yo 
Tey-9 koko 0 (3) 
For here the numerator must equal zero if the fraction does and 
this gives (2). The r., and r,, can be readily determined from the 
intercorrelations as explained.in the Appendix of Abzlities of Man 
already referred to. 
n?—n 


Under each of the 9 
Tzgfyg Value, expected if equiproportion were perfect, may be written, 


lon 

and the ~ 5 a= 
the two factor hypothesis and observation are the numerators of the 
ratio in (3). A measure might be built up of squared discrepancies 
similar to the coefficient of contingency which summarizes the amount 
of relation existing in all the cells over and above that expected by the 
hypothesis of chance. But the fact of correlation between correlation 
coefficient existing and the further fact that this correlation surface 
has not been worked out makes such a measure a difficult proceeding. 
But on dividing this discrepancy by the product of the two alienation 
coefficients as in (3), there results the residual correlation after the 
elimination of the general factor.! It is proposed to use the average 
of these partial correlation coefficients with g eliminated, 7.e., rzy.,, as 
the coefficient of equiproportion to measure the average amount of the 
group factors, and so the extent to which the variables fall short of 
being perfectly expressed by one general and n specific factors.” 


observed intercorrelation in the table the 








discrepancies found. These discrepancies between 





1 Ordinarily the partial correlation coefficient cannot be so cleanly interpreted. 
But when dealing, as here, with independent factors, additively combined and 
completely determining the dependent variable, the partial correlation with one 
factor controlled does accomplish the clean elimination of that factor and leave 
the residual raw correlation of the variable with its remaining factors. 

2 After working out this proposal, the author discovered that, as often before, 
Professor Spearman had anticipated it in explicitly suggesting the partial corre- 
lation with g controlled as an alternative criterion to the tetrad difference. This 
paper, therefore, is but an exploration of this alternative criterion and an attempt 
to point out what seems its superior features. See “Abilities of Man,” Appendix 
IV (10). The nomenclature there used of calling it the ‘specific correlation”’ 
seems unsuitable, for when factors are correlated they are group factors and cease 
to be specific to one variable alone. This extension of the meaning of specific factor 
will lead to confusion when it is wanted to denote its proper meaning of a factor 
not shared by any other variable. Consequently the ‘partial correlation with g 
eliminated” or ‘‘the group factor correlation’ would seem more precise terminology. 
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The partial correlation with g eliminated measures in familiar 
terms the absolute amount of the group factors present. Its signifi- 
cance ratio to its probable error indicates the probability of this 
correlation being due to sampling error. But it is not necessary to rest 
with this probability for a crucial test can be made. This test is to 
increase N, the number of the subjects (or if they are already suffi- 
ciently numerous to take sub-samplings). For if the partial correla- 
tion with g eliminated is due to group factors it will become more and 
more constant as N increases. But if it is due to sampling error it will 
decrease in accordance with the law of sampling error expressed in the 
probable error formula: ; 

PE of fay.) = .6745 <1 = pst (4) 

On the hypothesis that the exclusive presence of one general and n 
specific factors is obscured solely by sampling error, the r’.,., is 
zero and so the PE limit is .6745/+1/N. In the absence of group factors 
the PE of the observed partials will never exceed this limiting value 
and will tend to range just under it as N increases. So closely will 
they approach this limit that often, as shown in the examples later, 
they must be calculated to seven or eight decimal places to observe any 
divergence from it. Therefore instead of using an average probable 
error to divide into the average partial with g eliminated, it is proposed 
to use this PE limit, .6745/+/N, in getting the significance ratio of the 
coefficient of equiproportion.! 

The complete criterion of equiproportion proposed is then: For 
four variables either the significance ratio of the basic criterion of the 
tetrad difference, or the coefficient of equiproportion, r.,., and its 
significance ratio; 

For more than four variables: 

(a) The coefficient of equiproportion (which is the average of 


equations (3)) measuring the amount of group factors and sampling 
error present, 








1 Strictly this PE is not the PE of 7.,., for two reasons. First 7.,., is the aver- 
age of the rzy., partials. But the sampling error of an average of indices is less than 
of a single one of them. Hence the PE formula overstates the error and may be 
used for the 7zy., as well as for the rzy.,’s. Again in using the limiting PE value 
the maximal PE replaces the observed one. This obviates the necessity of aver- 
aging the observed ones in summarizing the table. The error is again on the 
conservative side of overstating the amount of sampling error. For both reasons 
the significance ratio of the Coefficient of Equiproportion may be depended upon 
(and is actually larger than given by the formule). 
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(b) Its significance ratio, 1.48 Teyo WN (=feyo/PE rey..), 
measuring the probability that the observed partial correlation is due 
to sampling error, 

(c) And if desired, an auxiliary index, the sigma of the partials, 
to measure the distribution of group factors evenly among all the 
variables when (sigma = 0) or their concentration in a few variables 
when (¢ 2 0). 

EXAMPLES 


A few illustrations have been worked out of the features of this 
criterion and comparisons with other criteria. A table of intercor- 
related anthropometric traits has been taken as one in which equipro- 
portion was extremely imperfect; another of earlier mental tests 
illustrates good equiproportion; and a third of more recent mental 
tests illustrates excellent equiproportion and at the same time its 
complication by experimentally introduced group factors in the shape 
of alternative forms of the same tests. The tables of raw correlations 
are to be found in Spearman, “ Abilities of Man,” pp. 44ff. 

In Table I no equiproportion is found among the anthropometric 
tests. The intercolumnar correlation is —.02; the observed median 


TaBLE I.\—Do.i’s ANTHROPOMETRIC CORRELATIONS 












































N = 477 
sas so dbo ew 1 2 3 4 5 6 
ee ee cae cee Right | Left Height | Sitting | Weight Vital 
grip grip height capacity 
Ride .ctbvinds be cehun .8016 | .8630 . 7684 .8277 .6941 .6362 
1 .0270| .0432 .0430 .0432 .0432 
2 .6402 .0420 .0377 .0455 .0443 
) PE’s 
3 — .2377| — .2881 .0356 .0438 .0451 
4 — .2490| — .4210| .4708 .0427 .0451 
Partials 
5 — .2357| — .0798| .2100 .2612 .0454 
6 .2385, .1822};—.1193 |—.1192 |—.0929 
Average partial 
(neglecting signs)..| .3202| .3222) 12,652 .3042 .1759 . 1504 























1 The partial correlation coefficients are in the lower left triangle of the table, 
and their probable errors are in the upper right triangle. 
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tetrad difference is more than 10 times the probable error expected 
by random sampling alone: and the correlation due to the group factors 
after eliminating g is .2564. This latter value is over 8 times the 
probable error and would occur by random sampling only once in 
100,000,000 times. These variables then, after eliminating the 
general factor, still contain large group factors which cannot possibly 
be attributed to sampling error. 


TaBLe II.—Bonser’s MENTAL CORRELATIONS 






































N = 757 
Mae D aa sia <.6'0d 40h 1 2 3 4 5 
CES cs cieicennks Mathe- | Control- | Literary Select Spelling 
matical ling interpre- | judgments 
judgment | association; tation 
ii ieuiiet ss 0060604 .701 .672 .607 .550 .398 
a... - “Toa Sete 0363 .0363 0364 0363 
2 Pe ashaes .0364 0363 0363 
PE’s 
3 — .046 a b-~ wheres .0364 .0363 
4 .018 .044 a .0363 
Partials 
5 .024 — .029 .044 — .031 
Average partial, (neg- 
lecting signs)... .. .029 .030 .028 .024 .032 




















1 Partials in lower left, PE’s in upper right sections. 


In Table II good equiproportion is found in Bonser’s correlation 
of some earlier intelligence and schooling tests. The intercolumnar r 
is .96; the observed median tetrad difference is only 1.18 times the 
probable error of sampling; the coefficient of equiproportion giving the 
average raw correlation of the group factors, plus sampling error, is 
.0284. This has a significance ratio of 1.16 times the probable-error- 
limiting-value obtainable by sampling. Note that the significance 
ratio of the coefficient of equiproportion is almost the same as that of 
the tetrad difference criterion, as it should be, since the methods are 
based on the same formule. The slight correlation here after eliminat- 
ing g would occur by random sampling 44 times in a 100 and hence 
there is no ground for supposing that it indicates any group factors 
atall. Note how closely the probable errors of the observed partials in 
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the table approach the probable error limit, which would hold were 
there no group factors but only random sampling error present. The 
observed and expected PE’s differ only in the fourth decimal place. 


TaBLe III.1—SpearMan’s Crvit Service CORRELATIONS 
Form A ‘“Selective,”’ or Controlled Response Type; N = 2599 






































EY els Ba oaks ws 2 0d 1A 2A 3A + 
SN he Cs oie skin's iar Completion; Analogies | Passages | Instructions 
iva: ea chee e es ok eed .7306 oh, 5640 .5954 .5900 
mae em” rome, . 1322762 | .01322791 | .01322489 
PE’s 
2A ne Wachee ts 0 1322799 | .01322799 
3A —.0036 |—.0027 | ......... _ .01322765 
NE Pig se fA ce =e Partials aE : 
4 — .0038 |—.0027 .0057 
Average partial............. .0048 .0041 .0040 .0041 
MG Rak ne cs boas nachon .0132 .0132 .0132 .0132 

















(a) Coefficient of equiproportion = average partial r of the table = .00425 = 
average raw correlation due to group factors and sampling error. 


(6) PE maximum limit = .01322808 = .6745/+/N. 

(c) Significance ratio = .322 = a/b. 

(d) Probability of the coefficient of equiproportion being due to sampling 
error = 83 in 100. 


1 The partial correlation coefficients with g eliminated are given in the lower left 
triangle of the table, and their probable errors are given in the upper right triangle. 

2 The tests as given in Spearman, “Abilities of Man,”’ p. 153 have been renum- 
bered to distinguish the two forms. The tests here 1A, 2A, 3A, 4, 1B, 2B, 3B are 
there 1, 4, 7, 5, 3, 2, 6 respectively. Test 4 (Spearman No. 5) only had one form 
and was introduced to get the necessary fourth test with which to calculate tetrad 
differences in each form group. 


In Table III is shown a triumph of precise experimentation in 
psychology. The tests were built to measure g with as little group 
factor as possible. Due to using the best types of verbal intelligence 
tests, care in construction and administration, controlled response 
(multiple choice) form of response, and a large number of subjects 
(N = 2599) the resulting equiproportion is almost perfect. The 
average correlation of the group factors together with the sampling 
error is .00425 which is the coefficient of equiproportion. As this 
amount is only abont a third of the probable error of pure sampling, 
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it has a probability of being due to that cause alone of 83 in 100. So 
that even this vanishingly small correlation is not likely due to group 
factors at all. The probable errors of the observed partials have to be 
calculated to seven and eight decimal places, as shown, before any 
discrepancy from the theoretical probable error of sampling without 
group factors is revealed. Herewith psychological experimentation 
begins to achieve the precision of the older sciences of physics and 
chemistry! If reported to the conventional two decimal places, every 
partial correlation coefficient but one in the table would appear as 
zero, indicating complete absence of both group factors and also of 
significant sampling error. 


TaBLE IV.1—SpeARMAN’S Crivi_t SERVICE CORRELATIONS 
Form B “Inventive,” or Free Completion Response Type; N = 2599 
























































I iiccherrnanaensccballl 1B 2B 3B 4 
Name.....................;Completion| Analogies | Passages | Instructions 
Tir a eg rae | 6792 .7130 6106 6190 
1B | sMenra 0132 0132 0132 
PE’s | 
2B =s s e | ore | 0132 
3B 0263 0011 | aby sie | .0132 
Partials 
4 | .0010 0285 | — .0241 
Average partial............. | 0197 .0104 0172 0179 


Pre 622 eK Sawa CI | 0132 .0132 | 0132 .0132 





(a) Coefficient of equiproportion = .0163 average raw correlation due to 
group factors and sampling error. 


(6) PE maximum limit = .0132 = .6745/-~/N. 
(c) Significance ratio = a/b = 1.23. 
(d) Probability of (a) being due to sampling error = 41 in 100. 


1 See footnotes to Table ITI. 


In Table IV the same tests on the same subjects are analyzed. 
But this time the tests were thrown into the inventive, or free comple- 
tion type of response. This apparently introduced some subjective 
scoring elements which act as group factors in slightly increasing the 
partial correlations with g eliminated. Although the differences from 
Table III are not statistically significant, yet there is an indication 
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that the controlled response type measures g with greater precision 
than the inventive response type with its scoring variables. 
TaBLE V.—SPEARMAN’S CiviL SERVICE CORRELATIONS — 


Form A vs. Form B (Introducing Group Factors Due to Alternative Forms of the 
Same Tests); N = 2599 








ee eit hos oa at 1A 2A 3A 
I Se ie i at ts ine da gsecade Completion Analogies Passages 
. { YES Paes .7973 .6059 .5986 
wg} EN LAW Chr erie . 7363 .7389 .6419 
1B .0464 — .0968 — .0585 
2B — .1802 .0603 — .0821 
3B — .0718 — .0687 .0094 
Average partial?........... .0833 .0914 .0500 














(a) Coefficient of equiproportion = .0749 = average raw correlation due to 
group factors and sampling error. 


(b) PE maximal limit = .6745/->/N = .0132. 

(c) Significance ratio = a/b = 5.67. 

(d) Probability of (a) being due to sampling error = 2 in 10,000. 

1 Determined by formula (19) of Spearman, ‘’Abilities of Man,’’ and aver- 
aging the three such ratios for each form. 

2 Throughout these tables the average partial is taken without regard to sign, 
to indicate the absolute size of the group factor correlation. The average partial 
here is the average of both forms from the three rows and the three column cells. 


In Table V a very neat experimental situation has enabled the 
isolation of group factors due to alternative forms (selective and 
inventive) of the same tests. Although these tests are known from 
Table ITI and Table IV to give excellent equiproportion with no signifi- 
cant group factors, yet when the calculation is based on both forms 
jointly there appears a coefficient of equiproportion measuring the cor- 
relation of group factors of .0749. As this is 5.67 times the probable 
error of sampling in the absence of group factors, and would occur once in 
5000 times by chance, this correlation indicates significant group factors. 

It is worth noting that the effect of group factors seems to be to 
increase the 7,,’s, the correlation of each test with the general factor, 
g, a8 determined from all the tests. Part of the tangled mass of group 
factors seems to be appropriated by the formula as a generalizable 
factor, leaving the balance as a structure of positive and negative! group 





1For a pioneer exploration into negative group factors, or “interference 
factors’’ and their correlations see Thompson, J. R.: British Journal Psychology, 
Vol. X, 1920. 
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factors as shown by the partials. Note that, in Table V, the 7.,’s 
differ markedly according as they are determined from one form or the 
other, and these again from the “pure” forms of Tables III and IV. 
This illustrates the chief defect in the procedure of partialling g out 
to get the coefficient of equiproportion as a criterion of equiproportion. 
For in proportion as the variables contain group factors the determina- 
tion of rag will become less accurate (increasing it) and the resulting par- 
tials will be less accurate. They will still indicate by their general size 
how imperfect the equiproportion is and which are the offending variables 
but they will not measure the size of the group factors so reliably.! 

It should be remembered that a given set of correlations can be 
synthesized by group factors in an enormous number of sizes and 
arrangements. The particular group factors here analyzed out are 
determined by the formule for getting r., and then r.y.,.. When 
the basal r., becomes changed the ensuing calculations will change 
accordingly. The intricacies of this are very puzzling as shown by 
much further data worked out, but not here published. More research 
is needed on possible structures of group factors and their effects on 
the correlation coefficients. At present, however, the accuracy of the 
‘ag determination can be increased in many ways. For example, after 
working out the partials, as in the above tables, those variables showing 
largest group factors might be dropped and the r.,’s recalculated from 
the others. The method of “reference values” is described by Pro- 
fessor Spearman in the Appendix of his book, ‘‘ The Abilities of Man.”’ 
Of course the principal method is so to build the tests and conduct the 
experiment that group factors will not appear, if g is to be measured. 

In Table VI a summary of the foregoing discussion and indices 
from the three criteria of equiproportion on the various sets of data 
are given. The chief fact to note here is that the three criteria agree, 
even the approximative intercolumnar correlation gives a dependable 
verdict. The tetrad difference criterion gives the same order of 
probability that an observed imperfection in the equiproportion is due 
to sampling error (and therefore not to be attributed to group factors) 





1 This in part accounts for the divergent significance ratio given by the tetrad 
difference criterion and the coefficient of equiproportion criterion in Table VI, the 
rows for Tables V and I. Another part of this divergence in rows for Tables III, 
IV and V is due to a slightly differing combination of the tests. Thus for the 
tetrad difference criterion for the Table V analysis, the Test No. 4, Instructions 
(which had only one form) was required for a fourth test, while for the coefficient of 
equiproportion criterion it was left out in order to isolate the group factors due to 
alternate forms of the same tests more purely. 
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as the coefficient of equiproportion. But the latter also affords a 
measure of the absolute size of the group factors in familiar correlation 
terms (compare Tables II and III). There is no doubt after inspecting 
the vanishingly small partial correlations of Table III that this is a 
more nearly perfect equiproportion than in Table II. Yet because 
Table II is based on one-third as many cases, its PE of the tetrad 
differences is larger and by this criterion shows a significance ratio 
that is alike for both tables. But the coefficient of equiproportion 
r zy.9 distinguishes the greater perfection of Table III. 

Since the original data, or the correlations based on a larger, or a 
smaller, number of subjects, N, were not available for the data of 
Tables I-V, the crucial test, going beyond the mere probability, that 
the apparent group factors were really group factors and not sampling 
error, could not be made here. 


SUMMARY 


From these illustrations of the analysis of the variables made 
possible by the coefficient of equiproportion criterion, its features may 
be summarized, as follows: 

1. For many variables it is much less laborious to calculate than 
the tetrad difference criterion. For 14 tests there are 3003 tetrads 
to work out, but only 91 partials, or one for each correlation coefficient. 

2. It not only measures the probability that the observed imperfec- 
tion of equiproportion is due to sampling error (and so that compliment- 
ing improbability that it is due to group factors), but it also provides 
a measure of the absolute size of the group factors, if they exist. 

3. It further identifies the precise pair of variables in which the 
group factors exist more simply than by a complicated comparison 
of various tetrads. 

4. It is an analysis which would be undertaken in any case after 
the application of the tetrad difference criterion in order to explore 
the constitution of the variables. It thus does not involve the eztra 
work of working out the criterion as do the former criteria. 

5. It provides a crucial test to distinguish between imperfect 
equiproportions being due to sampling error or to group factors. 
This test is the constancy, or decrease of the partials in proportion to 
1/-/N, as N increases. 

6. It is based on familiar and well established formule throughout. 

7. It possesses the disadvantage of measuring the group factors 
with decreasing accuracy as these grow larger and make the determina- 
tion of r., less accurate. 
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INFLUENCE OF EDUCATIONAL ATTAINMENT UPON 
TESTS OF INTELLIGENCE 


FRANK 8S. FREEMAN 
Department of Education, Cornell University 


I 


This study was undertaken as a repetition of Burt’s! investi- 
gation, as a result of which he derived his now much discussed regres- 
sion equation. It will be recalled that Burt set out in his study to 
determine the influence of school achievement upon an individual’s 
performance on the Binet scale. In the present investigation the 
Stanford-Binet was used, and several extensions were made. In the 
first place, coefficients of correlation and regression equations were 
obtained for the Dearborn Group test as well as for the Binet. And 
after the combined scores of school work were studied, the separate 
results in arithmetic, reading rate, and reading comprehension were 
considered independently with each test of intelligence. Thus the 
present investigation presents coefficients of correlation and regression 
equations for the Binet with the Burt Reasoning test, combined 
school work scores, arithmetic, reading rate and reading compre- 
hension. There are also similar calculations for the Dearborn Group 
test. Incidentally, some interesting data stand out regarding the 
Reasoning test. 


II 


Burt concluded, as a result of his study, that successful perform- 
ance on the Binet scale depends on, or is “‘attributable”’ to school 
achievement largely, if not mainly; further, that pure intelligence (as 
represented by performance on the Reasoning test) “contributes” 
only one-third of the total, that is, only a little more than half the 
amount for which schooling is responsible; and that chronological age 
is an almost negligible factor in the final Binet score.2 Thus Burt 
holds that the Binet and similar tests cannot be regarded as true 
measures of intelligence. So to regard them is to do the individual 
an injustice in some instances or, in others, to give him an undeservedly 
high rank where home, social, or racial influences have been conducive 





1 Burt, C.: “Mental and Scholastic Tests.”” Pp. 180ff. 
2B = .54S + .331 + .11A. 
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to the acquisition of information such as is acquired in the schools 
and tested with the Binet. 

But it seems advisable to point out certain a priori criticisms 
made against the Binet scale. Burt feels that ‘‘a host of subsidiary 
conditions” inevitably affects the score. Among these conditions 
are zeal, industry, good-will, emotional stability, scholastic informa- 
tion, the accidents of social class, the circumstances of sex. ‘These 
irrelevant influences, in one case propitious, in another prejudicial, 
improve or impair the final result.’”” Looking over this list, it must 
be apparent to one that tests of intelligence do not pretend to meas- 
ure zeal, industry, good-will, or emotional stability. While it is very 
likely true that these qualities receive less attention in group tests 
than in individual tests, it is reasonable to suppose that in giving the 
Binet, a qualified examiner will know when the test results are vitiated 
by deficiencies in any of them. And, of course, no results should be 
seriously considered unless found by a competent examiner. 

As for the accident of social class, we may point out that there are 
no data to show that it is a marked factor, but that, on the other 
hand, the most recent results reported by Terman and others in 
“Genetic Studies of Genius’”’ throw a heavy burden upon the environ- 
mental hypothesis. Much the same may be said of the circumstances 
of sex. The influence of the other condition—scholastic informa- 
tion—Burt’s study and this one set out to examine. 

Before considering the correlation and regression coefficients, 
certain aspects of the Burt investigation must be touched upon. The 
variables used were chronological age; school attainment measured 
by an educational examination, the results being revised by the 
teachers; intelligence measured by the Reasoning test, the results 
again being checked by the teachers; and finally the mental age found 
with the Binet-Simon scale, modified and standardized for English 
use. The striking thing about these variables is the fact that the 
results in the tests of school attainment were revised by the teachers. 
To subject findings to such treatment is to defeat the purpose of a 
properly standardized test. The same criticism, and with greater 
emphasis, is justified against the checking of results obtained with 
the Reasoning test, for this measure forms the basis of Burt’s study; 
it is regarded by him as a valid test of pure intelligence. 

A detailed examination of the Reasoning test cannot be given 
here, but attention should be called to its decided linguistic nature, 
and its lack of variety in the character of the questions. 
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The method of scoring and its lack of objectivity are significant. 
The child reads the problem and is asked to give the answer, after 
which he must supply a reason for his answer. If both answer and 
reason are correct, he is credited with one point. If the answer is 
incorrect, he is asked to try again either until he succeeds or until he 
fails in four successive attempts. An inspection of the problems will 
show that in many of these the child is bound to strike the correct 
answer in four attempts, or even in fewer, insome cases. Of course, the 
correct answer is not the sole basis for scoring; the reason is considered 
as well. But it is quite conceivable that an individual will flounder 
about, will give something of a reason and, therefore, will receive 
some credit. But in scoring the reason we encounter still further 
possibilities of difficulty. One-fourth of a point is deducted for an 
ill-expressed reason; one-half for an inadequate reason, and three- 
fourths for no reason at all. Now in order to establish uniformity in 
the scoring of reasons, it is necessary that there be sample replies 
which would fall within these three categories—much the same as we 
have for the Stanford-Binet. What for one examiner will be a well 
expressed statement might be a poor expression for another. And 
in demanding a well expressed reason are we not placing an undue 
premium upon language ability? It would certainly seem so. The 
matter of opinion applies likewise to scoring the adequacy or inade- 
quacy of a reason. The element of subjective judgment as a source 
of error has been too frequently indicated to need repetition or great 
emphasis here. But it is an element of marked significance in this 
instance, for Burt has built up an equation in which the criterion of 
intelligence is a decidedly linguistic reasoning test, whose scoring is 
open to the influence of personal opinion. There can be no doubt 
that the age norms for the Reasoning test would be more satisfac- 
tory if the possibilities for variable errors did not appear so great as 
they do. 


III 
Turning now to Burt’s table of correlations,! we find that the 
highest figure is the zero-order coefficient for the Binet and school 
work: namely, .91. When both chronological age and the Burt test 
are held constant, the partial coefficient for the Binet and school 


work is .61. This, of all second order coefficients is the highest. 
Therefore, Burt concludes that the Binet score is an index largely, if 





1 Op. cit., p. 182. 
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not mainly, of the individual’s mass of scholastic information. It is 
true that .61 is a moderately high coefficient; but even in the case of 
partial correlations we cannot conclude that the coefficient necessarily 
indicates a definite causal relationship. All we can say is that certain 
factors have been held constant as far as possible, and that with these 
constant there still exists a marked relationship. In this instance, 
we may say that, holding constant intelligence as manifested in the 
Burt test, the relationship between the Binet and school work is still 
marked, and that successful performance on one is noticeably depend- 
ent upon the same functions as make for success on the other. But 
in so saying we have in no way described those functions, nor have 
we accounted for the successful performance. 

It should also be noted that with school work and age held constant, 
the Binet and the Burt show a partial coefficient of .56. From this 
it may be concluded that the degree of successful performance on the 
Binet is an index to a marked extent of the ability possessed by an 
individual for success with the Burt test. Then, too (if we accept 
Burt’s interpretation of the coefficient of .61), it might be said that the 
Burt is likewise dependent upon school achievement; for if the Binet 
correlates well with school work, and if the Burt test correlates well 
with the Binet, may it not be maintained that the Burt is dependent 
upon schooling if the Binet is so dependent? 

Further, when age is held constant, the Binet and the Burt tests 
show a coefficient of .65. We have in this index another indication of 
the possible dependence of both measures on the same or similar 
functions. And again, then, may we not point to the Reasoning test 
as also being an index of school achievement, if that is true of the 
Binet? But this interpretation is by no means being insisted upon. 
The point is that we may logically be led into conclusions not neces- 
sarily warranted. Inspection of Burt’s tables will show other coeffi- 
cients—many of them high—which are of significance. It does not 
seem that anyone is justified in selecting as important only the coeffi- 
cients of .61 for the Binet and school work (Burt and age constant) and 
—.07 for the Burt and school work (Binet and age constant). There 
are others which merit consideration. 


IV 


As stated before, it was the aim of the present study to repeat 
Burt’s investigation as far as possible; and in addition to carry out 
the experiment with a representative group test (Dearborn) in order 
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to derive an equation for such a test. Furthermore, this study will 
give some data on the standards and suitability of the Reasoning 
test for the examination of American school children. 

The variables used here are the scores obtained with the Stanford- 
Binet (or the Dearborn Group), the scores of the Burt Reasoning test, 
and the composite scores of the school achievement tests. It will be 
recalled that Burt had a fourth variable, chronological age. However, 
in undertaking this experiment it was felt that instead of holding 
chronological age constant by means of partial correlations, it would be 
much more satisfactory to select subjects of one age at the beginning. 
Thus, the 77 pupils examined were all nine years of age. The mean 
CA is 9 years 6.4 months, with a mean deviation of 2.9 months. These 
pupils were not in one grade; they ranged from the upper half of Grade 
II to the upper half of Grade V. This distribution indicates that the 
school achievement of the subjects was not uniform; and this selection 
makes possible a greater difference in their performance on the tests 
of intelligence, if these tests depend upon school learning. 

Other considerations entered into the selection of these children. 
As far as possible, pupils were taken in schools where there was likely 
not to be a preponderance of one social level or of one race. Further- 
more, where it was known that a child was suffering from a physical 
defect, or where some other handicap had intervened and might 
retard his development—that child was not taken for this study. In 
general then we have here a rather representative group which has had 
an opportunity for normal development. 

Throughout, the testing program was so organized that no two 
scores of any child were obtained more than a month apart. In most 
instances, as a matter of fact, all the data for a pupil were gathered 
within several weeks. This was especially the case in getting the 
mental ages on the three tests of intelligence where it was desired to 
have them as close together as possible. 

The variables employed in the present study are, then, as follows: 


1. Intelligence as measured by the Stanford-Binet (or by the Dearborn group 
test). 

2. Intelligence as measured by the Burt Reasoning test. 

3. School achievement as measured by a test in reading ability (Dearborn- 
Westbrook) and by a test in arithmetic (Peet-Dearborn, problems). 


It is necessary to note here that educational attainment in this 
experiment will not be expressed in terms of educational age. Instead, 
the rank of each individual was obtained in terms of per- 
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centile score. Although this is a departure from Burt’s method, 
it is not an essential difference; for, since correlation is a matter of 
relative rank, it matters little whether we translate the raw score into 
educational age or percentile rank. In fact, the latter is probably 
more nearly exact. However, in interpreting regression coefficients 
and equations, it is necessary to take into account the units in which 
each variable is expressed, if comparisons of regression coefficients are 
to be made. For this reason, the equations here presented were 
worked out with the standard deviation of each variable in percentile 
rank units, so that their coefficients are directly comparable. 
Vv 

The first set of correlations and equations deals with the Stanford- 
Binet, the Burt and the school achievement tests. The mean mental 
age for the Binet is 9 years 8.2 months, with a standard deviation of 
12.1 months; while the mean IQ is 103.5, with a standard deviation of 
10.6. The range of mental ages is from 7-8 to 13-1; for IQ’s, from 77 
to 137. It appears from these figures that the children being dealt 
with here show a rather good distribution. 

The results of the Burt Reasoning test for these same pupils are 
rather striking. The mean mental age is 8 years 5.1 months, with 
a standard deviation of 15.8 months. The mean IQ is 89; its standard 
deviation 13.7. The range in MA is from 6-6 to 11-6; in IQ from 67 to 
127. As may be inferred from these figures, the distributions of 
mental ages and intelligence quotients obtained with the Burt show a 
marked positive skewing. In fact, with the Reasoning test 75 per 
cent attain mental ages below the mean chronological age. These 
MA’s and IQ’s are significant. The norms for the Reasoning test 
were established by examining English school children. Yet, when 
the test is applied to a group of representative nine-year old American 
pupils, it is found that they show a mean mental age of less than eight 
and one-half years, and a mean intelligence quotient of 89. 

If the Reasoning test and its norms are correct, then one of two 
things may be assumed. Either English school children are mentally 
superior to American children of like chronological age, or else the English 
pupils used for purposes of standardization were a select group on the 
upper levels; hence, the test is too difficult. It might also be claimed 
that the Burt test is a measure of special ability—which, of course, 
does not necessarily explain the discrepancy between the scores of the 
English and the American children. If it is a test of special ability, 
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and if the English and the American pupils are of about equal native 
ability, then might it not be held that performance on the Reasoning 
test is dependent on a special type of training? 

It should be added that the writer of this paper is not definitely 
drawing these conclusions, but merely indicating some possibilities 
which suggest themselves after examining the results of the tests. 


VI 


The coeffiecient of correlation for the Binet with the Burt test is 
.77 + .03. This is indeed high and, in fact, only just below the figure 
found by Burt himself: namely, .84. These two coefficients illustrate 
very well the fact that correlations, even when high, should not be 
regarded as indicating the score on one test when the score on another 
is known. An examination of the mental ages of the two tests shows 
a median difference of —16 months and a Q of 7 months, the range 
being from 8 to —39 months. (In each case the difference is the num- 
ber of months by which the Burt MA varies from the Binet MA for the 
same individual.) It seems from these marked discrepancies that if 
the Stanford-Binet is a true measure of intelligence, the Burt is not; 
or else the latter is only a part measure of aspecial aspect of intelligence. 
If the Reasoning test is a measure of pure intelligence, then it is prob- 
ably not adequately standardized—at least for use with American 
children—for it is not likely that the subjects of this investigation are 
actually members chiefly of the so-called “dull normal” group. If 
the Burt mental ages are correct, and if the American children are 
representative, then indeed our schools have many more cases of 
acceleration rather than retardation, from the point of view of mental 
age. But investigations have shown unquestionably that errors in 
classification bear more heavily on bright pupils than on dull pupils. 
It cannot be maintained that schools are adjusted for the dull or back- 
ward group. 

The correlation for the Binet with the percentile scores in the school 
achievement tests is .77 + .03. This coefficient, though high, is well 
below the .91 found by Burt. However, both clearly show the value 
of the Binet for predicting school success. But it is of interest to 
observe that this coefficient of .77 is only two points greater than that 
obtained by Burt for school work and the Reasoning test. 

The coefficient for the Burt test and school work is .72 + .03, five 
points below that for the Binet and school work, and three points below 
that reported by Burt. 
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Thus the coefficients found by Burt (Binet and school work .91, 
Burt and school work .75) and in this study (Binet and school work .77, 
Burt and school work .72) indicate that both tests of intelligence show 
essentially the same relationship to school achievement, although the 
coefficient of .91 does make it appear that the Binet is a more reliable 
instrument for predicting school success. This greater reliability is 
further emphasized when it is recalled that the Burt mental ages are 
well below those of the Binet, while the latter correspond far better 
with the grade work actually being done by the children who were 
examined. 


The partial correlations for the three variables thus far considered 
are: 


Binet and Burt (school work constant)...................0- .49 
Binet and school work (Burt constant)...................... .49 
Burt and school work (Binet constant)...................e6- .3l 


Reference to Burt’s study will show that the second-order coeffi- 
cient for the Binet and schoolwork (Burt and CA constant) is .61, 
whereas for the Burt and school work (Binet and CA constant) the 
coefficient is —.07. It was chiefly on these two coefficients that Burt 
based his conclusions that the Binet is an index largely of anindividual’s 
school attainment, while the Reasoning test measures pure intelligence, 
free from the influence of schooling. But it will be noted that the 
present coefficients do not agree with Burt’s. In place of the .61 
found by Burt, this experiment yields a coefficient of .49; and instead 
of —.07 for the Reasoning test, we find here +.31. 

It does not appear that Burt’s conclusions are justified. The fact 
that two variables show a high positive correlation does not necessarily 
mean that one is therefore the cause of or dependent upon the other, 
even when several other variables are held constant as far as possible. 
The partial coefficient of .49 (.61 in Burt’s study) is an indication that 
successful performance on the Binet and in tests of school work depends 
to a fairly marked degree on the same mental factors, even when the 
kind of intelligence measured by the Burt test is removed as a possible 
factor in the determination of scores. Yet, a coefficient of .49 or .61 
is sufficiently far removed from unity to make very doubtful the 
interpretation that one variable depends on or is responsible for the 
other. 

The first-order coefficient +.31 found in this study for the Reasoning 
test and school work is altogether different from the —.07 reported by 
Burt. The latter coefficient, being so close to zero, suggests that there 
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3 4 is practically no relationship between ability as displayed on the r 
i Reasoning test and in school work. The +.31, however, does point to i 
the conclusion that there is a definite though low degree of relationship e 
between the two. The correlation in this instance, smaller than that t 


with the Binet (.49), might possibly be explained by the fact that the 

Binet is varied in character, whereas the Burt is uniform, being 

linguistic and examining only the subject’s ability to understand 

relationships. g 
Using the partial correlations for this study quoted above, and the 


od proper standard deviations, the following regression equation is found: 
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where B is the Binet score, 
Bu the Burt score, and 
S the score in tests of school work. 








g 
This equation was worked out by using the percentile scores of all u 
three variables. Therefore, the regression coefficients are directly E 
comparable (as in the case of Burt’s study where his measures were all il 
in terms of ‘‘age’”’). Now, it is not the absolute size of the regression [ 
coefficient that is significant, for that will vary with the unit of measure- E 
ment or scoring employed. But in making comparisons, the relative d 
sizes of the regression coefficients within the same equation are impor- 
‘ tant in determining the weight of each variable in calculating the L 
probable score of the dependent variable. This does not mean, how- a 
if ever, that in finding the weights of the variables we have indicated tl 
Hs their respective “‘contributions,’’ for regression coefficients do not CC 
| necessarily show the extent of causal relationship.' It is significant to p! 
note here that even were we inclined to give the interpretation of the Ce 
equation offered by Dr. Burt and Professor Thomson, we could not m 
TasLe I.—SumMary oF CORRELATIONS FOR Tus Stupy ™ 
(Binet, Burt and School Tests) “ 
Binet with Burt...... UD RA a Lat 5 SE ee at ep ea a .77 
oso v5 vaglca sot <0 0s 00 a6 e600 esd see éne’ .77 p! 
ES, oo oS w's'h wh a'ocvdscdedOebtadevceeee ds .72 
Binet with Burt (school work constant)..................... .49 
Binet with school work (Burt constant)..................... .49 
Burt with school work (Binet constant)..................... 31 
5. B = .47Bu + .548. F 
1 For an interesting discussion of the interpretation of the regression equation, 
; the reader is referred to The Journal of Educational Psychology, Dec., 1925, May 
a and Sept., 1926. There Holzinger and F. N. Freeman, on the one hand, and & 
; Thomson on the other take issue on the matter. 
pe 
He 
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reach their conclusions, because the regression coefficients in this 
instance do not agree with those found by Burt.'! The foregoing 
equation and coefficients of correlation do not confirm the view that 
the Binet is a test chiefly of school achievement. 


Vil 


Turning now to the phase of this study dealing with the Dearborn 
group test in place of the Binet, we find the following data: 


Mean MA (Dearborn)................... 10 years 3.5 months 
EDS Sokwictwdcoeweucs 19.6 months 

Mean IQ (Dearborn).................... 107.3 
EE ee ee eee 17.1 


Thus both the mean and the standard deviation found with the 
group test are about 7 months higher than the corresponding meas- 
ures for the Binet. In view of the low mental ages shown by the 
Burt test, the above measures of central tendency and variability 
indicate, of course, somewhat greater discrepancies between the 
Dearborn and the Burt tests than those between the Binet and the 
Burt. The median difference is —22 months, and the quartile 
deviation is 8 months. 

The correlation between the Dearborn and the Burt is .80; for the 
Dearborn with school work it is .73. This group test, then, has 
almost exactly the same coefficient of correlation as that found when 
the Burt scale is correlated with school achievement (.72). But to 
conclude from these coefficients that the tests are equally useful in 
predicting school success or in getting an index of an individual’s 
capacity would be wrong, for it must be remembered that the mean 
mental age for the Dearborn is 10 years and a fraction of a month, 
whereas for the Burt the mean is 8 years and 5 months; while the 
mean I1Q’s are 107 and 89, respectively. 

The zero-order correlations with which we are dealing in this 
phase of the study are, then, as follows: 


oe been ves eees .80 + .027 

Dearborn with school work..................eee000- .73 + .03 

Teen nn nn ea oa decee teens .72 + .03 
From these the following partial coefficients are found: 

Dearborn with Burt (school work constant).................. .58 

Dearborn with school work (Burt constant).................. .37 

Burt with school work (Dearborn constant).................. .33 





1B = .548 + .33Bu + .11A. 
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The regression equation for these three varbiables is: 


D = .60Bu + .38S 
where D = intelligence as measured by the Dearborn group test of 
intelligence. 


Here, as in the case of the Binet, the percentile scores were used 


in calculating the equation. 


As a result of this equation—and as a result of the equation found 


for the Binet—as well as because of the coefficients of the zero and 
the first orders, we cannot say that Burt’s views are confirmed with 
respect to the nature of current tests of intelligence—if the Stanford- 
Binet and the Dearborn are representative—and with respect to the 
Reasoning test as a measure of pure intelligence, particularly so in 
the light of the Burt mental ages obtained in this study. 


VIII 


If, instead of dealing with the combined scores as a measure of 
school achievement, the independent scores are taken for arithmetic, 
rate of reading, and reading comprehension, interesting results are 
found, particularly with regard to the regression equation. 

For arithmetic: 


Zero-order: 

ee ols aides s oe ebis o oe OWeN ee .77 + .03 

SI OO 5 cds ciccdn sis cas cocevecdvctues .63 + .04 

Ro slik ccin cc ehbnie deeded eebene ne .67 + .04 
First-order: 

Binet with Burt (arithmetic constant).............. .60 

Binet with arithmetic (Burt constant).............. 24 

Burt with arithmetic (Binet constant).............. .37 
Equation: 

B = .65Bu + .21A 

where A = score in arithmetic. 

Zero-order : ) 

EN ck ci cactad sinndsceestcvesiese ss .80 + .027 

Dearborn with arithmetic....................0005- .67 + .04 

a id oo ing bwieuline oem ales .67 + .04 
First-order: 

Dearborn with Burt (arithmetic constant)........... .64 

Dearborn with arithmetic (Burt constant)........... .30 

Burt with arithmetic (Dearborn constant)........... .30 
Equation: 


D = .67Bu + .25A 


There are several things which stand out in this group of figures. 
First, there is a rather high first-order correlation for both the Dear- 
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born and the Binet with the Burt, indicating that even by holding 
constant an important factor in schooling like arithmetic, the rela- 
tionship still remains quite high. The second noteworthy fact is the 
very marked drop in the coefficients for the tests with arithmetic 
when one test or the other is held constant. This demonstrates 
perhaps the low degree of relationship between the tests and achieve- 
ment in arithmetic when the complex functions involved in the tests 
of intelligence are ‘“‘ partialed out.’’ And, third, the relative sizes of 
the regression coefficients show clearly that even if we were prone 
to interpret them in terms of “contribution” or ‘‘dependence,”’ 


we should have to assign to arithmetic a réle of decidedly minor 
importance. 


IX 


Inasmuch as there is no time element in the Reasoning test, it is 
not surprising to find that its correlation with rate of reading is only 
.34, and that the partial coefficients are +.04 and —.04 when the 
Dearborn and the Binet, respectively, are held constant. In both 
the Dearborn and the Binet there are time limits in the sections 
involving language, but the partial coefficients for reading rate and 
these tests are quite low, particularly so in the case of the Dearborn. 

The coefficients and regression equations follow: 


Zero-order: 

ated cess bbe whhe 6d4bdeb 6e0 00040 77 + .03 

ey DEIR, ca ciccccevccetdbecscceces 47 + .06 

TE OP Tee eC Te Tee .34 + .067 
First-order: 

Binet with Burt (reading rate constant)............ .74 

Binet with reading rate (Burt constant)............ .35 

Burt with reading rate (Binet constant)............ .04 
Equation: 

B = .73Bu + .25Rr 

where Rr is reading rate 

Zero-order : 

hai. 6 anak k bebe oe ceeeescece 6a .80 + .027 

Dearborn with reading rate................000eeee .39 + .06 

PN EY POMPEII cic ci cv ecctccdccciccvecse .34 + .067 
First-order: 

Dearborn with Burt (reading rate constant)........ 77 

Dearborn with reading rate (Burt constant)........ .21 

Burt with reading rate (Dearborn constant)........ .04 
Equation: 


D = .79Bu + .14Rr 
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It should be apparent from these correlations and regression 
equations that the place of reading rate in determining an individual’s 
score on the intelligence tests here discussed is an unimportant one. 


xX 


For reading comprehension the correlations are as follows: 


Zero-order: _ 
EE EET OT ULE TT OCCOT TS OT CTTETE 65 + .04 
Binet with reading comprehension................. 72 + .03 
Burt with reading comprehension.................. .69 + .04 
First-order: 


Binet with Burt (reading comprehension constant).. .54 
Binet with reading comprehension (Burt constant).. .42 
Burt with reading comprehension (Binet constant).. .31 
Equation: 
B = .54Bu + .37Re 
where Rec is reading comprehension 


Zero-order: 
DGD ck Chace cscdccscncdoeseces .80 + .027 
Dearborn with reading comprehension............. .72 + .03 
Burt with reading comprehension.................. .69 + .04 
First-order: 


Dearborn with Burt (reading comprehension constant) .60 
Dearborn with reading comprehension(Burt constant) .39 
Burt with reading comprehension(Dearborn constant) .27 
Equation: 
D = .60Bu + .32Re 
XI 


In almost every instance, when these three elements in school 
achievement are taken separately, it is found that the regression 
coefficient is considerably smaller than that for the combined score. 

For several reasons it must be concluded that this investigation 
does not confirm Burt’s findings. The results here obtained with the 
Reasoning test do not justify its acceptance as a criterion of pure 
intelligence. The coefficients of correlation do not warrant the 
conclusion that the Binet and similar tests are measures largely of 
school attainment. The regression coefficients signify the same thing. 

The writer of this paper does not feel that his results are indis- 
putably conclusive. He believes rather that they support Professor 
Thomson’s view that the experiment needs wide repetition. And he 
believes, too, that statistical procedure and statistical indices being 
used with educational data must be interpreted with a clear recog- 
nition of their limitations. 
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RAPID CORRELATION BY AN EMPIRICAL METHOD 
KARL D. WOOD 


Cornell University 


Arms AND METHODS OF CORRELATION COMPUTATION 


One of the most powerful instruments for the discovery of new 
truth in the biological, social and psychological sciences is the coeffi- 
cient of correlation. Ifa relationship between two hitherto supposedly 
unrelated quantities can be found, this is a fact which, with other such 
facts, may form the basis for theories or laws from which useful 
deductions may be made. The field of science of which Karl Pearson’s 
magazine has long been an exponent (Biometrika) has extended to 
economics, sociology, psychology and education, so that in each of 
these fields thousands of correlations have been determined. The 
accepted best method of determining the coefficient of correlation 
between two quantities was for some time the Pearson product- 
moment method, based on the “‘method of least squares” as explained 
in Brown.'! A much shorter method of nearly equal accuracy for get- 
ting the same result has been proposed by Ayres? and has almost 
completely replaced the original method in correlations in the field of 
educational measurement.* The proposed empirical method professes 
to be shorter than even the Ayres Shorter method, adequately accurate, 
and less susceptible to numerical mistakes. 

Method of Investigation.—This investigation consisted of determin- 
ing the Pearson coefficient of correlation (r) for 19 different pairs of 
measures and plotting these values of r against various factors obtained 
from the corresponding quartile distribution diagrams. To explain 
this, it is first necessary to explain in detail how a quartile distribution 
diagram is obtained and what ‘‘factors”’ are determinable from it. 

Figure 1 shows the distribution of the scores made by 103 students 
on a particular group of 15 problems (Quiz No. 2, first half). These 
scores are values of what is here called Measure S. In this sketch the 
height of the bar which stands above each group of scores on the base 
line represents the number of students getting scores in that group. 





1 Brown, William: ‘‘The Essentials of Mental Measurement.” P. 61. 


2 Ayres, L. P.: A Shorter Method for Correlation. Journal of Educational 
Research, Mar., 1920. 


? Rugg, H. O.: “Statistical Methods Applied to Education.” P. 274. 
243 
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Thus Fig. 1 shows that 5 students made scores of 2 or 3 on this group 
of problems, 11 students made scores of 4 or 5, 21 students made scores 
of 6 or 7, etc. It is also assumed that the group of scores labeled 2, 3, 
means scores from 1.1 to 3.0, and that the group 4, 5 means scores 
from 3.1 to 5.0. The median score on this sketch is determined by 
finding that score on either side of which half of the scores lie. Since 
half of the total of 103 is 51.5, the median is determined by adding up 
the numbers in each group, beginning at either end of the diagram, say 


. the left, until the total is 51.5. The score at which this total is reached 


is by definition the median score. The sum of the first five columns 
from the left of the diagram is 1 + 6 + 5-+ 11+ 21 = 44. Thesum 
of the first six columns from the left is 44 + 21 = 65. The median 
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Fis. 1. Distrisvrion or Measures §; 
Scores on fiasr HALF oF Quiz No.2. 


therefore lies in the range of scores from 7.1 to 9, since 51.5 is between 
44 and 65. The median may be estimated on the following assump- 
tion, which, though arbitrary, is probably close enough: The distance 
of the median above 7 (the beginning of this group) may be assumed to 


be (51.5 — 44)/21 times the range of scores covered by the group, or 
7.5 15 


a1 * 2= 51 = 0.7. The median is therefore 7.7. The lower 
quartile, which is defined as that score which was exceeded by three- 
quarters of the students, and the upper quartile, which is similarly 
defined as that score which was exceeded by only one-quarter of the 
students, were determined in a similar manner. For this distribution 
the lower quartile was found to be 5.1 (Q: = 5.1) and the upper 
quartile 10.1 (Qs; = 10.1). The median is designated by the symbol M. 
These three values, Q:, M and Q;, divide the students into four approxi- 
mately equal groups: 23, 22,26 and 32. These “quarters”’ of the class 
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are not exactly equal because the scale of scores consisted of such 
relatively large divisions that no values of Q:, M and Q; would divide 
the class into exactly equal groups, but the quarters are nearly enough 
equal for our purposes. 

Figure 2 shows the distribution of the scores of these same students 
in another group of problems (Measure P), the last half of Quiz No. 2. 
The quartiles and median determined as above divide the class again 
into the approximately equal groups of 24, 24, 26 and 29. 

Perfect positive correlation between the two measures (r = +1.0) 
would require that every student who scored in the upper quarter of 
measure S should also score in the upper quarter of measure P, and a 
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Fis.2. Disraiaurion of Measures P 
Scores on Stconp Harr of Quiz No.2. 


similar relationship should hold for the other quarters. Sketch 3 
shows that this is not the case, but can be made to show to what extent 
the tendency is in that direction. Figure 3 is plotted to show that of 
the 23 students who scored in the upper quarter of measure S (column 
1 of Fig. 3) only 7 of them also scored in the upper quarter of measure 
P; 5 more of these scored in the second quarter (between M and Q;) 
of measure P, 8 scored in the third quarter, and 3 in the fourth quarter. 
The chart also shows the number of each quarter of measure S which 
scored in each quarter of measure P. This figure will be called a 
quartile distribution diagram, or quartile diagram. It is similar to 
the “‘scatter diagram” used in the graphical determination of the 
Pearson correlation coefficient except in that it specifies only the 
quarter of each measure in which each student’s scores lie rather than 
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the exact location on the scale of each measure. A “scatter diagram”’ 
can therefore readily be converted into a quartile diagram by drawing 
horizontal and vertical lines at each quartile and median. This is 
often the quickest method of getting a quartile diagram, and should 
take little more time than the plotting of the “correlation table”’ 
for the “shorter method” of Pearson product-moment correlation 
computation, which is usually considered to consume about half the 
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Fie. 3. Quarrn.g Oisraievrion DIAGRAM 


time required for the complete computation. If r could be obtained 
by a single computation from the quartile diagram by some empirical 
method, the total time for the computation by the empirical method 
would then be only about half of that required for the conventional 
shorter method. The method of doing this was then investigated. 
As was pointed out at the beginning of the last paragraph, perfect 
correlation between the two measures would require that all of them 
lie on the diagonal of agreement of Fig. 3. In actual cases, however, 
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this practically never happens, but it might be supposed that if the 
correlation is positive, more of the scores would lie on the diagonal of 
agreement than on the diagonal of disagreement, or that the scores 


Fia.4. 
DIAGRAMS SHOWING MEANING oF symbots. 


“Quanrice Retention” (Q) 1s 
THE PER CANT of nensunes 
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would tend to group themselves along the diagonal of agreement. 
One of the hypotheses here investigated was, therefore, that the coeffi- 
cient of correlation is related in some way to the per cent of the scores 
which lie on the diagonal of agreement of the quartile diagram. This 
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percentage, which was here given the name of “quartile retention,” 
is one of the “factors” that can be obtained from the quartile diagram. 
The meaning of this and three other factors here investigated is shown 
in Fig. 4. Referring to this figure, and designating by N the total 
number of scores in the diagram, the four factors investigated are: 





1. Quartile retention, Q = ce 
2. Biquartile retention, B = = ’ 
3. Quartile difference, D = = WT ~ 
=B — =D 





4. End quartile difference, E = N 


DaTa AND RESULTS 


The data from which the relationship of each of these quantities to 
the coefficient of correlation was determined including nine correlations 
from a thesis by Wood! and ten from unpublished data of Professor 
P. J. Kruse related to intelligence testing of Cornell University fresh- 
men. The values of r range from 0.07 to 0.56, and of N from 60 to 
225. The total of 19 correlations, while not as many as would be 
preferred, is probably sufficient to make the chance of an accidental 
but spurious relationship very slight. The data would be much more 
valuable if they covered a wider range of values of r, say from —0.2 
to +0.8, but unfortunately such data were not available. Since most 
of the correlations in educational work come within the range of the 
data, the relationships discovered should have considerable utility 
but should be used with caution outside of that range. 

Each of the factors Q, B, D and E was plotted against r, giving four 
graphs of 19 points each. On each graph the best-fitting line was 
drawn by eye so that half of the points would lie on either side of it, 
and the equation of each line was determined by the usual method of 
analytic geometry as a basis for the prediction of r from each of the 
factors. 

Table I lists the errors of prediction of r based on each of the four 
factors under investigation for each of the 19 correlation coefficients 





1 Wood, K. D.: “Study of True-false and Yes-no Examinations.’”’ Cornell 
University Master’s Thesis, 1926, p. 21. 
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which constitute the data. The summary at the bottom of the table 
shows that the “quartile difference,’ D - v = is the most 
accurate basis for the prediction of r, since both the average and the 
median error of prediction of r based on D are less than the average or 
median based on Q, Bor EZ. This table shows also the noteworthy fact 
that the median error of prediction of r on the basis of D is only 0.03, 
whereas the median probable error of r itself (which takes account of 
the possibility that the group for which the correlation was computed 
was not a typical sample) was 0.06. Errors due to the use of this 
empirical relationship between r and D appear therefore to be only 
about half as great as the errors in r due to inadequate sampling. From 
the normal probability curve (Rugg, op. cit., p. 153) it can be seen that 
a predicted value comes within 2 PE in over 80 per cent of the cases; 
hence this empirical relationship gives values of r that are within the 
probable error (PE) of r in over 80 per cent of the cases. 

From another point of view, since r as determined by the empirical 
method involves both of these errors (sampling and prediction), if it is 
assumed that the probable error of the total result should be the sum of 
the probable errors of each computation, the empirical method may be 
said to increase the PE of r from 0.06 to 0.09, an increase of 50 per cent. 
The same effect as this result would result from dividing the number of 
scores in the correlation by 1.5’, since the formula for PE of r is 
(1 — r?)/\/N. The accuracy of the empirical method is therefore 
about the same as would result from a Pearson product moment 
correlation computation using 1/2.25 times the number of items. A 
correlation of 225 items by the empirical method therefore gives about 
the same accuracy as one of 100 items by the lengthy product-moment 
method. If, then, as Rugg’s Table X (op. cit., p. 404) shows, the 
least number of cases which good educational practice justifies (PE 
of r less than one-third of r) is 50 when r is over 0.3 by the Pearson 
method, then the corresponding least number of cases when the 
empirical method is used is 2.25 X 50 or 125. 

This indicates that the saving in time by the use of the empirical 
method is more than offset by the greater number of cases required 
for a given accuracy; with the same number of cases the empirical 
method might require only 60 per cent as much time as the Pearson, 
but for equal accuracy, using the minimum number of cases for each 
computation, the time required for the empirical method would be 
2.25 X 60 per cent = 135 per cent as much time, or 35 per cent more 
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time. This added time, might, however, be justified by the lessened 


likelihood of numerical mistakes. 


TaBLe I.—ERrRors OF PREDICTION OF r BASED ON VARIOUS FACTORS FROM THE 


QUARTILE DISTRIBUTIONS 























Prediction Equations (from Graphs 1-4): r = Q oi r= - fei 
, ao Re E — 20 
48’ 84 
Basis of prediction 
Quiz No. r PE of r 
ee me: | D E 
1 .07 .06 —.11 — .07 — .03 .07 
2 .10 .07 .09 .07 12 .09 
3 18 .07 13 .05 .02 .07 
4 .18 .06 — .01 .02 — .03 .04 
5 19 .04 — .03 — .05 — .02 .05 
6 .23 .05 .O1 .10 — .05 .12 
7 .25 .09 — .03 — .07 01 .08 
8 .30 .04 .21 — .06 — 11 .06 
9 34 .07 — .02 — .09 — .01 .09 
10 .36 .06 — .04 .14 .00 15 
11 .38 .04 .10 .09 .02 .04 
12 .38 .05 — .08 — .09 — .01 .16 
13 .39 .05 .04 .00 .03 .00 
14 .42 .06 — .06 — .01 — .06 .02 
15 .46 .08 18 .05 19 .06 
16 - ,47 .04 12 — .03 .05 .05 
17 .52 .08 — .04 .02 .04 .O1 
18 .53 .06 .00 — .03 .O1 .06 
19 .56 04 — .08 .02 —.11 .00 
Average error........... .058 .072 .056 .048 .064 
Median error........... .06 .06 .05 .03 .06 




















Note that median error of prediction of r based on D is only half the median 


probable error of 7 itself. | 


Conclusions.—The above study indicates that the coefficient of 
correlation between two measures can be determined empirically 
from the quartile distribution diagram. The saving in time by this 
method (about 50 per cent) is however accompanied by an increase of 
about 50 per cent in the probable error of r; for the same probable 
error of r, the empirical method requires 2.25 times as many cases, 
and therefore probably takes appreciably more time than the shorter 
methods of computing the Pearson r. 
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The equation for prediction of r from the quartile distribution 
diagram which has the least median error of those investigated is: 


ZA — XC 
N 





r = 2.08 


where r = coefficient of correlation in per cent, 2A = sum of cases 
on one diagonal of diagram (diagonal of agreement), =2C = sum of 
cases on Other diagonal of diagram (diagonal of disagreement), and 
N = total number of cases in diagram. 


CONCLUSIONS 


This equation has been determined from values of r between 0.07 
and 0.56 and from values of N from 60 to 225. Itis probably incorrect 
for values of r above 0.6 and should, of course, never be used for values 
of N less than 60, preferably only for values of N over 125 and values of 
r between 0.3 and 0.6. 
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SOME FURTHER DATA ON THE LANGUAGE 
HANDICAP 


MELVIN RIGG 
Kenyon College 


The day is past when the psychologist can calmly sit down with a 
stack of Stanford-Binet blanks, determine the median IQ’s for a few 
children of native American, German, Jewish, and Italian descent, and 
proceed without delay to announce to the world the relative intelli- 
gence levels of the nationalities represented. 

Three major problems present themselves: First, the matter of 
sampling; second, the influence of social status; third, the question of 
language handicap. 

No convenient means have appeared of determining the extent to 
which the various nationalities of the world are adequately represented 
by their immigrants to the United States. It is well to remember, 
consequently, that results obtained with immigrants apply primarily 
to these same immigrants. 

Attempts have been made to study the influence of social status by 
dividing children into groups determined by the occupation of the 
father, the Taussig Scale usually being employed. Both Artlitt! and 
Bere? report a rise in the Binet ratings as one proceeds from lower to 
higher groups. Arlitt, however, found native white children higher 
than Italian or negro children of the same occupational levels, and 
Bere found children of southern Italian descent lower than Bohemian 
or Hebrew children of the same levels, although the three groups com- 
bined fall considerably below the native white children reported by 
Arlitt. 

But perhaps the foreign children have a language handicap. A 
child living in a foreign environment, possibly hearing English only 
at school, may well be at a disadvantage. Brown*® sometimes found 
foreign children who rated from 6 to 18 months higher if tested in their 
own language. 

Brigham‘ found that the army ratings were higher for foreign born 
soldiers of longer residence in the United States. But the gain in 
Beta is about the same as the gain in Alpha and he concludes, not that 
there is a language handicap, but that the character of our immigration 
has been deteriorating. 

Various investigators, Pintner,'* Keller,s5 Murdoch,’ Young,? 
have found that foreign children do relatively better on non-verbal 
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tests than on verbal.* This contention, however, is weakened by the 
investigation of Stockton,® who found that native children of low rat- 
ings also did relatively better on pictorial tests than on tests involving 
language. The better showing on the non-verbal tests can not, there- 
fore, be taken to prove in the case of the foreign children the existence 
of language handicap. And we must not forget that verbal tests corre- 
late much better with scholarship than non-verbal tests. Many inves- 
tigators believe that a lack of ability in tests involving language 
means a lower capacity, not a different capacity. 

The language handicap, moreover, ought to be an equal impedi- 
ment to the several foreign groups. Italian immigration is hardly 
more recent than Jewish, and the effect of hearing Italian at home 
ought not to be more baneful than the effect of hearing Yiddish. The 
view that there are peculiar differences in the kinds of ability found in 
these two nationalities is weakened by Seago and Koldin,’® who 
announce that when twelve year old Jewish and Italian children of 
the same rating on the National Intelligence Test are paired against 
each other, no significant differences are revealed between the two 
groups on the subtests. 

Bere also divided foreign children into groups determined by the 
amount of English spoken in the home. There is a rise in the Binet 
ratings as we pass to the homes where the most English is used. But 
there is also a rise in the Pintner-Patterson ratings. Furthermore, the 
children who did better on the performance tests than on the Binet do 
not necessarily represent the most recent immigration or the homes 
where the least English is spoken. The conclusion reached is that the 
more intelligent families learn English, rather than that their ratings 
are high simply because of this knowledge. Italian children are lower 
on the Binet than Hebrew or Bohemian children of the same groups. 

But although language handicap may be only a minor factor in the 
situation, and may not serve to explain entirely the lower test ratings 
of certain foreign groups, it may yet be a factor which should be taken 
into consideration. Colvin and Allen,!! in comparing 173 Americans 
with 163 Italians, found the two groups very nearly equal on a test of 
arithmetic fundamentals. On arithmetical reasoning the difference 
became noticeable, and on a reading test it was much increased. 

This report suggests a finding made by the writer in an investiga- 
tion of the results of one of the St. Louis testing programs. Out of the 





* Similarly, in Bere’s study Italian children do relatively better on the’non- 
verbal tests, and surpass the Hebrews on the Pintner-Patterson. 
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battery of tests which were given, three were selected for special 
study because it was hoped they might throw light on the problem of 
language handicap. These were the National Intelligence Test, Scale 
A, Form 1; the Woody-McCall Arithmetic Fundamentals, Form 1; 
and the Thorndike-McCall Reading Test, Form 1. The children 
tested were in Grades III to VIII, inclusive, of the public schools, and 
none of the children represented in the following results are negroes. 
The foreign children were identified from answers to the question: 
What foreign language is spoken in the home? This classification is 
perhaps more reliable than might at first be thought, since the chief 
errors are probably cases in which the foreign language was simply 
not reported, and these children were swallowed up in the very large 
“native” group composed of children who reported no foreign language. 
A summary of the results is given: 














Intelligence Arithmetic Reading 
ete quotients quotients quotients 
Group “att 
Med- | PE Med- PE Med- PE 
ian jmedian| ian j|median} ian /|median 
PN ind wh pics gin oe ope 8130 | 104.85 .17 | 102.96; .15 | 100.28 17 
Ba chia 5 occ ae 1949 | 103.30) .34/ 104.26) .31/| 98.06 .35 
German............ 1095 | 104.69 .44 | 104.64 .39 | 99.61 45 
i tse eee ¢.axea 445 | 103.19 .82 | 106.10) .72| 98.35) .76 
RR Ra 140 | 91.43) 1.13) 93.81) 1.28 | 86.74 1.11 
Bohemian..... biel at 118 | 104.00} 1.22 | 104.56) 1.28 | 96.67 1.39 
Miscellaneous*...... 151 | 105.14; 1.08 | 104.33) 1.04 | 98.47) 1.18 


























* This group earns the right to its appellation by the virtue of including the 
following: French 26, Hungarian 20, Greek 16, Russian 14, Polish 10, Swedish 
10, Spanish 8, Serbian 6, Croatian 5, Lithuanian 4, Ukrainian 4, Slovak 4, Chinese 
4, Roumanian 3, Syrian 3, Bulgarian 1, Armenian 1, Danish i, foreign but unspeci- 
fied 11. 

In order that some idea may be had of the reliability of the differ- 
ences, the following table has been prepared showing for the pairs of 
groups named the ratio of the difference to the probable error of the 
difference. 3 

The arithmetic test involves practically no language; the reading 
test is essentially one of language; the intelligence test stands some- 
where in between. The writer believes that there is a slight language 
handicap. In spite of a reliable difference in favor of the native group 
on the intelligence test, there is a reliable difference (chances 995 in 
1000) in favor of the foreign group on the arithmetic test. And arith- 
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ALUES OF PE an 








Groups Intelligence | Arithmetic | Reading 
Native and foreign..................... 4.08 3.82 5.69 
Native and German..................... 34 4.00 1.40 
EE rr err rer 1.98 4.24 2.47 
Native and Italian..................... 11.77 7.09 12.09 
Native and Bohemian.................. .69 1.24 2.58 
Native and miscellaneous............... .27 1.30 1.52 














A ratio of 4 indicates conventional certainty that the errors due to random 
sampling are not great enough to reverse the differences obtained; a ratio of 3 
means that the chances are 98 in 100; a ratio of 2, that they are 91 in 100; a ratio 
of 1, that they are 75 in 100 against such a reversal. 


metic, unlike various non-verbal performance tests, may be expected 
to correlate highly with scholarship. But the native group is again 
ahead in reading, by a difference greater than either the intelligence 
or arithmetic differences. It is probably somewhat hazardous to make 
comparisons between the arithmetic and reading tests, although the 
results have been reduced to quotients for that purpose, since the 
norms in both cases may be questioned. Every group has a lower 
standing on the reading test than on the arithmetic. But the foreign 
children fall short to a much greater extent than the native. The 
following table summarizes the drop between the median arithmetic 
quotient and the median reading quotient for the groups: 


NE Re eee i ns a eae nen eewsaaae 2.68 
ae en Ce were Thee &s.0 h0ie4 toh bane kee 6.20 
a a ns nen bb 06a ae wae 5.03 
ee i eh os cn dc mee awe ba 7.75 
i a ss a ece eas weaned 7.07 
es nb ewneseaeewnan we 7.89 
ee a en is ae wake binned died 5.86 


In view of these facts, the writer ventures the guess that the lan- 
guage handicap may be enough to explain the lower standing of most 
of the foreign groups on the intelligence test. Over half of the foreign 
children in this study are German. The study is, then, in agreement 
with many others in finding these, together with Jews and Bohemians, 
to be approximately at the native level. Southern Europe is repre- 
sented in the main by the group of 140 Italians. Although there is 
probably a language handicap operating here, the very low arithmetic 
quotient can not in this way be accounted for. The results in this 
respect are in agreement with those of Bere. 
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Unfortunately, there is no measurement of social status, and there 
remains the problem of sampling. It is not the intention of the writer 
to make sweeping statements about national differences. Only a 
careful comparison of thoroughly typical communities in each foreign 
country could afford us a basis for such statements. The evidence 
here submitted is another portion of a total amount which seems to be 
cumulative, but which applies after all only to that group of persons 


which can be thought to be adequately represented by the cases 
included. 


SUMMARY 


1. The article reviews some of the efforts which have been made to 
estimate the influence of social status and language handicap in the 
comparison of different nationalities. 

2. Some results are presented which tend to show that a small 
language handicap does exist which, however, is insufficient to explain 
the large variation in the case of the Italians. 
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THE CONSTANCY OF THE IQ AND THE TRAINING OF 
EXAMINERS 


AGNES L. ROGERS, DOROTHY DURLING AND KATHERINE 
McBRIDE 


Bryn Mawr College 


There have been published several investigations presenting data 
on the constancy of the intelligence quotient and the reliability 
of the Stanford Revision of the Binet-Simon Measuring Scale of 
Intelligence. It is the purpose of this article to show what results 
are attained, when the examiners are students who have specialized 
in psychology and whose ability to give a mental examination as 
far as correct procedure is concerned has been determined before 
testing with extreme care. Lack of experience with testing and with 
pupils in schools must account for whatever change occurs in IQ’s 
in excess of what is customarily found. 

The subjects examined were pupils in public schools. They were 
superior as regards physical and educational conditions. Almost 
all of them came from average or superior home surroundings. In 
connection with another investigation, the principals of the schools 
were asked to classify the homes of the pupils tested according to the 


TaBLe I.—Scuoot Reports oN Home ENVIRONMENT 


A. Probably favorable circumstances. 

1. Systematic home instruction. 

2. Well-educated parents. 

3. Travel. 

4. Excellent companions. 
B. Probably unfavorable circumstances. 
Excessive indulgence. 
One or both parents dead. 
Parents divorced. 
Imperfect parental control. 
Unsuitable companions. 
Undue severity. 
Child obliged to care for home. 
Child has to look after self. 
Child left to care of nurse. 

10. Family below average mentally. 

C. Unclassified. 

1. Average home. 

2. Not enough information to say. 
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criteria given by Terman.' Table I gives these criteria. The 
principal of one school marked no home as unfavorable, the principal 
of the other marked only four as belonging to this class. This is not 
surprising as the schools are in prosperous suburbs. Thus the group 
studied consists of pupils whose socio-economic background varies 
from average to superior. 


Table II presents the distribution of the pupils’ chronological ages. 


TaBLE IJ.—CHRONOLOGICAL AGE AT First EXAMINATION 








Chronological age Bryn Mawr Wayne 

4-0 to 4-11 0 4 

5-0 to 5-11 1 8 

6-0 to 6-11 22 14 

7-0 to 7-11 3 4 

8-0 to 8-11 2 2 
he i sls uigins Wihibine 6 yr. 5 mos. 6 yr. 3 mos. 
Number of pupils................ 28 32 











Table III presents the mental maturity of the group at the first 
Binet-Simon Examination. 


TaBLe II].—Mentat Matoritry at First TrEestTina 





Mental age Bryn Mawr Wayne 





3-0 to 3-11 
4-0 to 4-11 
5-0 to 5-11 
6-0 to 6-11 1 
7-0 to 7-11 
8-0 to 8-11 
9-0 to 9-11 
ENIC As creed dadcceannwed 6 yr. 6 mos. 7 yr. 6 mos. 


oon OF KF © 
Own = 











The standing in IQ on the first application of the Stanford Revision 
is shown in Table IV. 





1Terman, L. M.: “Genetic Studies of Genius.”’ Vol. 1, Stanford University 
Press, 1925, p. 77. 
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TaBLe 1V.—INTELLIGENCE QUOTIENT ON First EXAMINATION 














IQ Bryn Mawr Wayne 

50 to 59 0 1 

60 to 69 0 1 

70 to 79 1 0 

80 to 89 4 0 

90 to 99 4 3 

100 to 109 15 8 

110 to 119 3 11 

120 1 5 

130 0 2 

140 0 1 
oe omrere 102 113 
PE isch ob cccsndee cd de 90 100 
ic b60dee 26 cnckveseaus 107 121 





The intervals between the two applications are as indicated in 
Table V. 


TaBLE V.—INTERVAL BETWEEN FIRST AND SECOND APPLICATION OF THE STANFORD 








REVISION 
Interval in years Bryn Mawr Wayne 

4 0 0 
1 0 2 
1% 0 1 
2 21 2 
2% 4 4 
3 2 3 
3% 0 4 
4 1 10 
44 0 4 
5 0 1 
5% 0 0 
6 0 1 

REL, baence cise Sekesekeaea 2 yrs. 5 mos. 3 yrs. 8 mos. 











The examiners on the first occasion were: Dr. Ada H. Arlitt, Dr. 
Gertrude Rand and their students in graduate work at Bryn Mawr 
College. The second examination was made by students specializing 
in psychology at Bryn Mawr College in 1925-26 and 1926—27.! 

1 Those students who cooperated with the authors were Harriet Ahlers, Mary 


Bell, Cecilia Baechle, Clare Hardy, Cornelia Hatch, Rose Huston, Angela Johnston, 
Jean Loeb, Anne H. Morrison, Elizabeth Stubbs, Elizabeth Tyson. 
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The correlation between the IQ’s for the two schools was .75 + .05 
for the Bryn Mawr pupils and .69 + .06 for the Wayne School pupils. 
Results found by other investigators are summarized by Freeman.! 
We give these in Table VI. 


TaBLeE VI.—MEASURES OF THE VARIATIONS IN THE IQ ON RETESTING AS FOUND IN 
SEVERAL TypicaL STupIEs 











Percentage Limits of Coefficient 
poe Number | differing middle 50 per Average | of correla- 
of cases | 10 points change |tion between 
cent 
or more two tests 
Terman'.......... 435 15 —3.3 to 45.7 4.5 .93 
Rugg and? Colloton! 137 .12 —2.3to +5.6| 4.7 .84 
-—2 to+4 
Garrison®......... 468 .085 -3 to+4 5.4 .88 
—-3 to+5 
Se 114 ye —1.2to +1.9) 3.1 .95 




















1 Terman, L. M.: “The Intelligence of School Children.” Chap. LX, 1919- 

2 Rugg, H. and Colloton, C.: Constancy of the Stanford-Binet IQ as Shown 
by Retests. Journal of Educational Psychology, Vol. XII, 1921, pp. 315-322. 

* Garrison, 8. C.: Additional Retests by Means of the Stanford Revision of 
the Binet-Simon Tests. Journal of Educational Psychology, Vol. XIII, 1922, pp. 
307-312. 

4 Rugg, L. 8.: Retests and the Constancy of the IQ. Journal of Educational 
Psychology, Vol. XVI, 1925, pp. 341-343. 


For the 1154 cases reported by Freeman, the coefficients lie between 
84 and .95. Our own are slightly higher, however, showing the 
effect, it may be, of the lack of testing experience in students whose 
preparation has been exceptionally thorough otherwise. 

Table VII gives a complete record of the change in IQ for each 
school in detail. The same data are grouped in Table VIII. 

Table VIII reveals that there is a difference of more than ten 
points in 10.7 per cent of the Bryn Mawr School pupils tested and 
31.2 per cent of the Wayne pupils. Seventy-eight per cent had IQs 
within ten points of their first IQ. Freeman reported for the 1154 
cases a percentage of 85 to 91 within ten points of the first IQ attained. 
The middle 50 per cent of change ranges from +5 increase to —7 





1Freeman, F. N.: Mental Tests, p. 345. 
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points decrease. The corresponding figures obtained by Terman! 


are from 5.7 increase to 3.—3 decrease. 

















Taste VII 
Interval Age at first test IQ group 
Change in I ° a 3 
ibis ~3 | a8 a5 ~ . ® 2/Slalelea 
28 i, a8 3\i pa 3 = 3 3 o|- 
et 5\2 4i<||a i 
Above 20 . Eee eet @ tT ee Sree ey 1j} 1 
+20 
+19 
+18 , ae tS |e oS 3 1 
+17 , et BPs me a a ge 1 
+16 ’ se 1 a : 1{| 1 
+15 
+14 | eS : te 1 
+13 
+12 
+11 1| 1 2 by 82. 22s 2 
+10 1 1 ) Ba e+ 1 
+9 2 3/. ; Li si wes 
+ 8 1 - 1 ae B46 1 5 1 
+7 aoe 4 oe 7 PS oe Be 1 
+ 6 we hes oe Bi es a een ae ws 1 
+ 5 oS ss 3 Pe o< Srse hs a 1 ST es fee 3 
+ 4 Ae ua a | ee ae BE Dew See 
+ 3 ob es 3 ° Si aa ft de | oe ee 2 tr ° 3 
+ 2 ee on Ri aed « a. 1 
+1 1/1 ce ec FOOT e - 2 2 
0 ot. 88 Bh . ak 4 ’ bw eS Shin 3 
-1 PS ee ee 1| 2 Ree 
- 2 1 fe ete ge $14: fy pen ye 1/} 1 
- 3 at on 1] 32 ; a: Ss 3 
-4 2 ‘ 7.2 oy | B. 2 
-— 5 OF Ge Jo 2]. eee By 2 
-— 6 ot £3. eee } 1 Chis tae 
-7 1} 2 . 2/1 af ats 3 
- 8 ee Ge 1 ee eg} 1 
- 9 - 4 4 1}; 2) 1) 4 
—10 a 1 aes 1 
—12 1 1 : 5 ee 1 
—13 
—14 ee as 1 , 1 1 
—15 
—16 
-17 1 ey Be 2 3 ee 2 
—20 oe ail ab.. meen ve 2S 1 
Below —20 2i} 1} 1 ' at SE 8 
Dias cc dvci 5114/34] 3| 4]}18| 42]. 2/13|25]12] 8 || 60 



























































1 Terman, L. M.: “‘The Intelligence of School Children.” 


1919, p. 142. 
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Taste VIII.—Cuance 1n IQ 





Change in IQ 


Bryn Mawr 


Wayne 





+26 to +30 
+21 to +25 
+16 to +20 
+11 to +15 
+ 6to +10 
+ lto+ 5 
0 
— lto-— 5 
— 6to —10 
—ll to —15 
—16 to —20 
—20 to —25 
—26 to —30 
—31 to —35 


Om Om: 


i Nedem weindnetne ewe bee 28 








Noe Ga bP GO P bo _ 


—_ 





These results indicate how far a theoretical preparation combined 
with a careful written memory test of the directions for procedure in 
applying the examination, which covers all the details on which 
beginners in testing are apt to make mistakes, can equip the examiner 


for this work. 
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A METHOD OF EVALUATING THE UNITS OF A TEST 


E. L. CLARK 


Personnel Office, Northwestern University 


There is, in some quarters, a considerable amount of dissatisfaction 
with the essay type of examination. Many instructors are turning 
to forms of tests which may be objectively scored. Yet, as most people 
realize, the fact that an examination may be objectively scored does 
not guarantee that it is a valid test in all of its parts. The practice 
of some instructors is to make up a new objective test each time one 
is needed, profiting only slightly by the experience of previous tests. 
If a method is found by which the more valuable parts of the test may 
be saved to be used again, a test or series of tests may be built which 
are more valid and easier to administer than the conventional essay 
form. 

Such a method has been used by the writer to select valid units of 
common objective forms of tests in psychology. This method of 
evaluating the units of a test depends upon at least two conditionsin its 
use. In the first place, the units (the sentences, the blanks to be filled 
in, the choices to be made) which are to be evaluated should be of 
such a nature that they can be counted as either entirely right or 
entirely wrong. This condition is usually met in intelligence and 
educational tests. Second there must be some criterion for ranking 
in the order of their mastery of the material all individuals taking the 
test. The most convenient criterion is the total score on the entire 
test. If this is used the test should be composed of a large number of 
well selected units. When possible it is wise to supplement the total 
score by any additional information available which will help to get 
the students arranged in the order in which they know the material 
over which the test is given. 

The formula used for evaluating the units of a test is as follows: 

P-D 
IV = ay where 

IV is the index of validity of the unit of the test being considered, 

D is the percentage of the group of students failing to answer cor- 
rectly the unit under consideration: it is the difficulty of the unit, 

P is the percentage of the ‘criterion group” who missed the partic- 
ular unit. When the entire tests of all the students have been arranged 
in order of merit by the criterion mentioned above, the criterion group 
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for any particular unit is the D percentage of the class who are lowest. 
Thus the criterion group varies in size as the difficulty of the units 
varies. As an example, suppose that Question 1 (a unit) of a test has 
been missed by 80 of a class of 200 students. D in this case is .40 
since 80 is 40 per cent of 200. The criterion group for Question 1 is 
the 80 individuals who are the lowest 80 of the class. If by examining 
their tests it is found that 60 of them have missed Question 1, P is 


determined to be .75 (8 is 75 per cent ) Substituting these values 


in the formula, IV = 7p, we have IV = “77 =“ _ 5g 

An assumption of this formula is that in a perfect test made up of 
perfect units by varying difficulties there would be perfect agreement 
between the units and the whole. If one-half of the students missed a 
given unit it would be missed by all of the individuals of the lower half 
of the group and by no others. It is also assumed that this condition 
holds for other percentages of difficulty (the percentage of the class 
missing the given unit). It is further assumed that if there is a chance 
situation the most likely percentage of individuals within the criterion 
group who miss the unit will be the same percentage as that of the 
entire group who miss it. Then for any difficulty the highest possible 
value of IV is 1.00. And for any difficulty IV may be zero. When P 
is equal to D, IV is zero; 7.e., the ratio of individuals included in the 
criterion group who miss the unit is the same as in the ratio of all who 
miss it to the entire group. (In the example above, to say that the IV 
of Question 1 is .58 is in reality stating that the number of students 
within the criterion group missing Question 1 is 58 per cent as many 
as it could be in excess of the most likely number by chance.) How- 
ever, if the more able individuals, the better students as indicated by 
the criterion, are the ones who miss the unit, IV becomes negative. 
The lower limits of the possible negative values are different for differ- 
ent D’s. But this difference in the lowest possible negative value of 
IV has little or no practical significance in the use of the formula 
because any unit which actually has any negative value is worthless. 

There are, however, some limitations to the use of this formula. 
In the first place, the IV is very subject to chance fluctuations when D 
is extremely high or extremely low. This limitation becomes less and 
less significant as the number of students taking the test becomes 
larger. As a tentative rule we may say that the difference between D 
and D? must include at least ten individuals. A second factor is 
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that when the score on the entire test is used alone as a means of 
determining the criterion groups the size of the IV for a unit tends 
somewhat to average higher as the number of units making up the 
test becomes less. While great care should be used to get the best 
possible criterion by which to rank the students in the order of their 
knowledge of the test as a whole, some investigations have shown that 
the IV’s on the average do not increase very much by greatly limiting 
the criterion. A third apparent limitation to the use of the formula is 
the amount of work involved in computing the IV’s. This limitation 
is likely to be a real one the first few times this method of evaluation is 
used but it becomes unimportant as one develops short cuts in its 
application. If the tests are piled in the order of their merit, and if 
some plan is developed for tabulating the difficulty of the units the 
data for the formula may easily be obtained. A table of IV’s for the 
most common D’s and P’s may be made, or a slide rule may be used for 
the computations. Practice in the use of the formula shortens the 
process greatly. 

Apparent advantages of this method are: first, more individuals 
are included than are often used in determining the value of a unit 
and, second, a numerical index for designating the relative value of 
units is obtained which is between zero and one and which is relatively 
free from being influenced by the difficulty of the unit, the number 
of students taking the test, and from the number of units used in the 
total test. 











SOME FACTORS AFFECTING TEACHERS’ MARKS 
CHARLES E. LAUTERBACH 


The Graduate School of Education, Harvard University 


In assigning marks to a set of papers will a teacher give higher 
marks to typewritten papers than to those prepared in longhand? 

I have always felt that the typewritten paper is favored, and that 
of two papers of approximately equal quality, the typewritten paper 
will receive the higher mark. In reading great numbers of papers I 
catch myself picking up a typewritten sheet with a feeling of relief 
and a sort of cordial, lenient attitude which I suspect results in a more 
liberal mark than the writer would otherwise have received. 

In order to find out whether typewriting does affect a teacher’s 
judgment in the assignment of marks, the following experiment was 
set up. A group of Grade VIII pupils was asked to write compo- 
sitions as part of its regular class work. The choice of the subject 
was left to the pupil. All compositions were written in longhand and 
included such topics as “An Exciting Incident,” ‘‘How to Make a 
Swing,” ‘The Warblers,” ‘‘War Prevention,” ‘Kindness to Dumb 
Animals,” etc. The papers were not corrected by the teachers and 
were not rewritten. 

Sixty papers were selected at random and copied on the typewriter. 
Each paper was copied as the pupil had written it. No attempt was 
made to alter spelling, punctuation, grammar, sentence structure, 
paragraphing, or any other element of form or composition. The 
only changes made were those imposed by the typewriting machine. 
The left-hand margin is automatically fixed on the typewriter and 
the number of words on a line is greater than in longhand. Spacing, 
too, is uniform, and legibility is an inherent characteristic of the 
typewritten sheet. The aim was to copy the paper just as the pupil 
himself would probably have written it on a typewriter. 

The resulting total of 120 papers was then divided into four sets of 
30 papers each, chosen at random, except that each set was made to 
consist of 15 papers in longhand and 15 in typewriting, with no 
composition duplicated in any set. Each set of papers was graded 
by 16 teachers. Fifty-seven different teachers cooperated in the 
marking, seven of them marking two sets of papers each. Nine 
hundred sixty marks were thus secured on the longhand papers, and 


960 marks on the same papers prepared in typewriting. 
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The teachers who marked the papers were all grade teachers or 
English teachers with a median teaching experience of eight years. 


Uniform instructions for marking were provided. 


The distribution of marks assigned by these 57 teachers is shown 
in Table I. The medians and means for the two groups of papers are 


practically the same. 


The only suggestion that the typewritten 


papers were favored lies in the fact that they received a larger number 
of marks within the range of 95-100. But this advantage is offset by 
the other fact that they also received a larger number of low marks. 


TaBLE I.—Tue DISTRIBUTION OF THE Marks OF 57 TEACHERS ON 60 EIGHTH 
GRADE CgMPOSITIONS, WRITTEN ON THE TYPEWRITER (T. W.) AND ALSO IN 


LONGHAND (L. H.) 








Marks Typewriting Longhand 

95 .0-100 114 84 

90.0- 94.9 128 134 

85.0— 89.9 187 188 

80.0- 84.9 157 207 

75.0- 79.9 124 133 

70.0- 74.9 91 78 

65.0- 69.9 37 43 

60.0— 64.9 38 33 

55.0— 59.9 18 18 

50.0— 54.9 41 25 

45.0— 49.9 4 1 

40.0— 44.9 12 8 

35.0- 39.9 1 0 

30.0-— 34.9 2 4 

25.0- 29.9 2 2 

20.0-— 24.9 2 2 

15.0- 19.9 0 
10.0- 14.9 2 

SE Pee yee 960 960 
ES oe ie ss gia a haa ees 83.38 83.22 
ke i rs a ae sel an 84.30 83.79 
EERE A Nee neg eee nner een ear } 13.60 11.40 
te id eg ema 74.45 75.98 
Say rere Seas ery rem 90.09 89.42 











The variability of the two groups of papers differs slightly, the 
Standard Deviation for the typewritten papers being 13.6; for the 
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LONGHAND PAPERS 


60 ES O56 
20 EES 55 
63 Ss 9 7 
60 95 
(( i  ; 
(5 i 9+ 
5) TS S. 
SG 93 
60 ES © 5 
6 EE 9 
7% ES 30 
0 SE 30 
70 Cs 90 
69 ES 69 
eee 
5) a 2 


TYPEWRITTEN PAPERS 


| SR 
Par 
CS SS 
00 es 97 
56 3 5 
76 Ces OS 
60 Cs 9S 
|. SC 
50 es oS 
50 es SS 
6 es 9+ 
oS GEE 95 
66 es 93 
70 es 9: 











Illustrating the variability in range of the grades of 16 English teachers on two sets 
of English compositions, one set prepared in longhand, the other typewritten. 
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longhand papers, 11.4. It is interesting to note that the typewritten 
papers show the greater variability. Because of the seemingly 
greater uniformity of the typewritten papers, the contrary result was 
to be expected. It may be that type-writing makes errors stand out 
more prominently than longhand and that this factor attracts greater 
penalties just as legibility and form may attract greater rewards. 

While the experiment has not yielded conclusive results in regard 
to the relative merits of typewritten and longhand papers, it has 
yielded certain interesting incidental data. That teacher’s marks are 
unreliable is an old story. That the scales which teachers use in 
assigning marks vary as much as the marks themselves has not been 
made as evident. For example, one teacher assigns marks within a 
range of 85-93, while another teacher assigns marks on the same set of 
papers within a range of 20-98. This fact is strikingly represented 
in the accompanying graph. The fact may be expressed by saying 
that the ranges on the longhand papers vary from 9 units to 78 units; 
on the typewritten papers, from 8 units to 78 units. The Standard 
Deviation of each teacher’s marks could be taken to indicate the same 
fact. One teacher has a Standard Deviation of 2.46 + .30; another 
of 26.4 + 3.25. 

There is, however, another aspect of this matter which deserves 
attention. If the teachers were asked to rank these papers in order 
of merit, would their judgments be as much at variance as their 
marks seem to indicate? In other words, how well would they agree 
in their choice of the poorest or the best paper in the lot? 

The question can be answered, at least in part, by determining the 
group rank of each paper and then finding out how closely the teachers’ 
ranks correlate with the group ranks., Each teacher’s papers were 
therefore ranked according to the marks assigned. The total number 
of ranks assigned each paper was determined, and that paper which 
received the lowest total number of ranks was given a group rank of 
first; the paper receiving the second lowest total number of ranks 
was given a group rank of second; and so on. 

A value was thus secured for each paper based upon the judgment 
of 57 teachers. The teachers’ ranks were then correlated with these 


group ranks. Pearson coefficients of correlation were derived as 
shown in Table IT: 
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TaBLE II.—Prarson COEFFICIENTS OF CORRELATION SHOWING THE AGREEMENT 
BETWEEN TEACHERS’ RANKS AND THE GROUP RANK oF 60 E1cutTH GRADE 
COMPOSITIONS WRITTEN ON THE TYPEWRITER (T. W.) AND ALSO IN 
LonGHaAnp (L. H.) 








Set Longhand, r Typewriting, r 
Oe re Se oe ee .61 + .027 .57 + .029 
Se ee eee eee .94 + .005 .58 + .028 
RE Ae ee ee ee ee 22 + .O17 .74 + .019 
a hte ot dn nt gt bk hee .68 + .023 .74 + .019 
eG. vee wels ws ak eee .70 + .010 .64 + .012 











An element of artificiality is inherent in the correlation tables 
from which these coefficients were derived, due to the method of 
determining group rank, but the coefficients probably indicate the 
facts of the matter. In their judgments of quality as expressed by 
ranks teachers reach considerable agreement. They may agree quite 
closely in their choice of the best or poorest paper, but they use widely 
varying marks to express the value of that paper. Thus, while one 
teacher’s marks range from 85-93 and another teacher’s marks range 
from 20-98, there may be no difference of opinion whatever as to 
which is the poorest paper, only one teacher has marked it 85 while 
the other has marked it 20. 

Whether such marks represent differing degrees of ‘‘ poorness”’ 
is still another matter, but the facts argue clearly for some sort of 
categorical marking scheme. Letter marks are probably as good as any 
and undoubtedly result in a fairer expression of the teacher’s judgment 
than percentage marks. 

So much has been written about the unreliability of teacher’s 
marks that further discussion seems unnecessary. However, it 
appears from the data presented in this paper (1) that teacher’s marks 
taken at face value have little significance; (2) that teachers’ marks, 
if considered as expressing relative standing within a group, denote 
the facts with fair accuracy. 

The particular grades assigned by these 57 teachers group them- 
selves as odd or even as shown in Table III: 
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TaBLE II].—Tue DISTRIBUTION oF THE Marks oF 57 TEACHERS ON 60 EIGHTH 
GRADE COMPOSITIONS, WRITTEN ON THE TYPEWRITER (T. W.) AND ALSO IN 
LonGHAND (L. H.), Groupzp as Opp or Even 

















Numbers — Ty hl | Both | Per Cent 
and writing 
: oe. 3 | 
Even numbers ending in 0........... | 358 342 | 700 36 
Even numbers not ending in 0........ 186 205 | 391 21 
Odd numbers ending in 5............ 296 306 | 602 31 
Odd numbers not ending in 5........ 120 107 | 227, | (12 
a om i ul pitas 960 960 | 1920 | 100 











Even numbers have the preference. Fifty-seven per cent of the 
total number of marks assigned are even, while 36 per cent are even 
numbers ending in 0. Thirty-one per cent of the marks assigned are 
odd numbers ending in 5. The tendency seems to be to assign marks 
which are multiples of ten or five. Odd numbers not ending in 5 are 
avoided. The reason probably is that teachers assign marks with 
reference to the upper or lower limits of significant intervals. 











A FURTHER NOTE ON — OF READING 
TE 


RAYMOND M. MOSHER 


State Normal School, New Haven, Conn. 


Introduction.—Current and Ruch have reported recently on the 
reliability of reading tests. Their study outlined briefly what had 
been previously accomplished by Monroe and Gates in the matter of 
establishing reliability coefficients and then proceeded with a discussion 
of the method of investigation which led to the data which they 
presented. 

The aim of this brief article is to offer the results of a study the 
purpose of which was to ascertain the degree to which results would be 
coincident with test data derived from a different pupil population 
reproducing as closely as possible the technique employed by Current 
and Ruch. There has been no effort to evolve a new technique. On 
the contrary the writer has endeavored to practice one aspect of the 
scientific method which has either fallen into disrepute or has been 
painfully neglected, namely—to follow as accurately as possible each 
step in his predecessor’s method in order to check on the equality of 
results. 

Method, Subjects, and Tests—With the exception of one factor the 
present study is s¢zmilar to that of Current and Ruch. The difference 
occurs at this point: 104 pupils in one school from Grades IV to VI were 
used, whereas Current and Ruch had a total pupil population of 154 
children distributed approximately equally throughout Grades IV to 
VIII. These children were enrolled in three schools. 

The schedule for administering the tests was such that one day 
intervened between each testing, and the tests utilized were: 


Monroe’s Standardized Silent Reading Test, Test II, Form I. 
The above Form 2. 

Courtis Silent Reading Test No. 2, Form 1. 

Idem, Form 2. 

Stanford Reading Test, Form A. 

Idem, Form B. 

Thorndike-McCall Reading Scale, Form 1. 

Idem, Form 2. 

Lippincott-Chapman Reading Test. 


CHONoarond > 





1 Current, W. F. and Ruch, G. M.: Further Studies on the Reliability of 
Reading Tests. Journal of Educational Psychology, Vol. X VII, 1926, pp. 476-481. 
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10. Haggerty Reading Examination, Sigma 3, Form A. 
11. Idem, Form B. 


Therefore, in respect to the tests and their administration the 
writer has attempted to duplicate the materials and technique reported 
by Current and Ruch. 

Results.—The table on p. 273 gives the results of the study made by 
Current and Ruch and those of the writer. The data for the former 
investigators are presented in bold type, while those of the latter are 
in italicized type. It is not only interesting but rather significant that 
the coefficients and PE values are close. 


CONCLUSIONS 


1. The present study confirms the findings of Current and Ruch. 

2. The tests studied showed differences in reliability, z.e., from .65 
to .92, when given to the same pupils. 

3. Quite in accordance with the findings of Current and Ruch, 
“probable errors of estimated true scores were computed which showed 
that certain tests yield errors of measurement nearly twice as large 
as the best tests available.” 
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NOTES ON ARTICLES IN EDUCATIONAL 
PSYCHOLOGY IN CURRENT ISSUES OF 


in—~ OTHER MAGAZINES SR 


REPORTED BY JAMES E, MENDENHALL 


Research Associate, Institute of Educational Research, Teachers College 
Columbia University 











INTELLIGENCE TESTING 


A Comparative Study of the Intelligence of Indians in United States Indian 
Schools and in the Public or Common School. Thomas R. Gaith and James E. 
Garrett. School and Society, Feb. 11, 1928, 178-184. 3910 Indians in Grades 
IV to VIII were tested on the National Intelligence Test. The Indians in the 
public schools were younger and more intelligent than those in the United States 
schools. The mixed bloods made higher scores than the full bloods yet were 
inferior to the white control group. 

Sex Differences in 5925 High School Seniors in Ten Psychological Tests. William 
F. Book and John L. Meadows. The Journal of Applied Psychology, Feb., 1928, 
56-81. An examination of scores on separate tests reveal slight differences in 
ability between groups. The plea is made for finding differences in specific 
abilities as measured by particular tests rather than differences in the total test. 


ABILITY GROUPING 


The Effect of Homogeneous Classification on the Scholastic Achievement of Bright 
Pupils. J. T. Worlton. The Elementary School Journal, Jan., 1928, 336-345. 
A comparison of 3000 elementary school children (grouped and ungrouped accord- 
ing to intelligence and achievement tests) revealed a decided superiority on the 
part of the grouped in accomplishment. 


PsycHOLOGY OF LEARNING 


Factors Influencing the Relative Economy of Massed and Distributed Practice in 
Learning. ‘Theodore C. Ruch. Psychological Review, Jan., 1928, 19-45. A 
summary of studies to date, most of which were made with college subjects 

The Effect of Practice on the Improvement of Silent Reading in Adults. L. A. 
Averill and A. D. Mueller. Journal of Educational Research, Feb., 1928, 124-128. 
With three 45 minute practice periods per week for three months, 16 seniors made 
an average gain of 99 per cent in rate of reading, comprehension remaining constant. 
The Thorndike McCall and unstandardized but objective tests were used. 

A Learning Curve Equation as Fitted to Learning Records. M. C. Barlow. 
Psychological Review, March, 1928, 142-160. By the method of least squares 
a curve is fitted to early practice scores in multiplication tests. The predictive 
value of early scores is thereby increased. 

Recall vs. Repetition in the Learning of Rote and Meaningful Material. Wm. 
Clark Trow. The American Journal of Psychology, Jan., 1928, 112-116. Ten 
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graduate students were practised and tested on digits, words, paragraph words, 
paragraph ideas, at intervals of 1 day, 1 week, 15 weeks. The recall method 
proved superior to representation, recall—re-presentation, and massed periods. 


New Type EXAMINATIONS 


Some Faults Common in Informal Objective Tests Made by High School Teachers. 
Baldwin Lee. Educational Administration and Supervision, Feb., 1928, 105-113. 
An analysis was made of the types of errors occurring in the construction of 
objective examinations by graduate students. 

A New Correction for Chance in Examinations of Alternate Response Type. 
Harry A. Greene. Journal of Educational Research, Feb., 1928, 102-107. The 
paired true-false test (in which one statement checks another) may prove a more 
exacting method of checking up in part the element of chance on a limited number 
of points. 

Limitations of the True False Statement. CharlesC. Weidemann. The Journal 
of Educational Method, Feb., 1928, 214-215. 


EDUCATIONAL AND VOCATIONAL GUIDANCE 


Kindergarten Training and Grade Achievement. W.D. Commins and Theodore 
Shank. Education, March, 1928, 410-415. Forty-five fifth graders who had 
kindergarten training were compared with the entire fifth grade of 130 pupils. 
There were practically no differences between the groups on the Stanford Achieve- 
ment Test. 

College Student Personnel Problems. II. An Analytic Study of the Student 
Personnel Problem. R. A. Brotemarkle. The Journal of Applied Psychology, 
Feb., 1928, 1-42. A comprehensive program for analyzing student personnel 
problems—achievement, college admission, hereditary factors, environmental 
training factors, economic factors, serial factors, general intelligence rating, specific 
mental abilities rating, temperament and emotional response rating, vocational 
interests. 

Predicting Achievement in College and after Graduation. John D. Beatty and 
Glen U. Cleeton. The Personnel Journal, Feb., 1928, 344-351. Results of 
psychological tests, scholastic standing, participation in extra curricular activities, 
importance of present position, yearly salary of graduate—these criteria for 90 
graduates of the College of Engineering and Industries of the Carnegie Institute of 
Technology were correlated singly and in combination. The correlations were low, 
0.40 and below. 

What Effective Guidance Techniques Are Being Administered through Tests and 
Measuring Devices. Virgil E. Dickson. The testing program is evaluated in the 
light of its possible contributions to the guidance problem. 

Predictive Value of Four Specific Factors for Freshman English and Mathematics. 
R. A. Kent and Esther Schreurs. School and Society, Feb. 25, 1928, 242-246. 
The records of 524 entering students of Northwestern University were studied. 
Mental alertness score (Thurstone test), high school quarter, number of units of 
Mathematics and English offered, and general achievement in all freshman 
studies—these four factors are compared. 
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Stup1iEs BEARING ON CURRICULAR MATERIALS 


The Vocabulary of the Textbook and the Pupil. Frances E. Andrews. Chicago 
Schools Journal, Jan., 1928, 163-166. An analysis of the vocabulary of a ninth 
grade text on correlated mathematics was followed by a test of pupil vocabulary 
in the subject. The pupils failed more words than either the Thorndike or the 
Pressey list ratings would tend to predict. 

Measuring the Results of Physical Education. Luther Van Buskirk. The 
Journal of Educational Method, Feb., 1928, 221-229. A rating scale is constructed 
based on objectives in physical education as listed by 5 textbooks, 24 state courses 
of study, and 50 magazine articles. 

Experimentation in the Development of a Book to Meet Educational Needs. 
George C. Kyte. Educational Administration and Supervision, Feb., 1928, 86- 
104. Analysis of textbooks in use, a study of children’s interest in and rating 
of selected paragraphs, and a study of word and paragraph difficulty according 
to the Thorndike Word Book and to experimental results—these devices are used 
in setting up a social studies book for the fourth and fifth grades. 

The Construction of First-grade Reading Material. Nila Banton Smith. Jour- 
nal of Educational Research, Feb., 1928, 79-89. A method of constructing 
reading material in the light of child interest and scientific principles of learning is 
outlined. 


CHARACTER AND PERSONALITY 


A Score Card of Personal Behavior. Lloyd N. Yepsen. The Journal of Applied 
Psychology, Feb., 1928, 140-147. A method of rating in which described behavior 
responses are used. 

Measuring Introversion and Extroversion. Theodosia C. Hewlett and Olive 
P. Lester. The Personnel Journal, Feb., 1928, 352-360. Twenty freshmen 
college girls, self-rated as extraverts, were compared with 20, self-rated as intro- 
verts. An objective measure of expressiveness or introversion secured in an 
interview correlated with self-rating .064 with intelligence test score .068, with 
Gilliland Test of Sociability 0.40 and with a rating by the dean 0.60. 


PsyYCHOLOGY OF SCHOOL SUBJECTS 


Summary of Reading Investigations. (July 1, 1926 to June 30, 1927.) IL. 
William Scott Gray. The Elementary School Journal, Feb., 1928, 443-459. 


New TEstTs 


A Scale for Measuring Dental Age. Psyche Cattell, School and Society, 
Jan. 14, 1928, 52-56. 

Preliminary Work on Tests for Stenographic Ability. The Staff of the Bureau 
of Public Personne] Administration. Public Personnel Studies, Feb., 1928, 46—55. 
A method by which tests of stenographic ability may be standardized is presented. 

Additional Tests for Mechanical Drawing Aptitude. E. G. Stoy. The Per- 
sonnel Journal, Feb., 1928, 361-366. 
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MISCELLANEOUS 


The Measurement of Opinion. L.L. Thurston. The Journal of Abnormal and 
Social Psychology, Jan.-Mar., 1928, 415-430. A statistical device for scaling an 
opinions test is presented. 

Typical Reading Disabilities of College Entrants. Beatrice F. Allbright and 
Floy Horning. California Quarterly of Secondary Education, Jan., 1928, 166-169. 
An analysis of 23,760 errors made by 1053 college freshmen on the Thorndike 
Intelligence Booklets, Form P, reveals most common types of error. 

Interests of Adults and High School Pupils in Newspaper Reading. Cecil L. 
Ross. School and Society, Feb. 18, 1928, 212-214. A Study of the kinds of 
material read by 1837 subway commuters. The items were submitted to high 
school students as a check list of interests. 

Are We Keeping Up with the High School Boys and Girls. J.U. Kinder. Edu- 
cation, Jan., 1928, 291-300. A survey of leisure activities (especially of kinds of 
reading) of 801 high school boys was made. 
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NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 


Sat EDUCATION se; 


CONDUCTED BY FRANCES M. FOSTER 











Five New PsycHOLoGIEs 


Psychology, Its Facts and Principles, A New Contribution to Theoretical 
Psychology, by Harry L. Hollingworth. New York: D. Appleton 
and Co., 1928. Pp. 539. 


Here is that rather rare thing—a new and original contribution to 
theoretical psychology. One will attempt in vain to classify Dr. 
Hollingworth’s Psychology as a member of any existing -ism. Itisa 
new -ism all by itself. 

The author succeeds remarkably well in coordinating an entire 
systematic account of psychology about one principle—that of 
“Redintegration,’”’ which is foreign to the accounts of others and was 
hitherto used by him only in the discussion of limited fields—the 
psychology of meaning and of thinking. ‘“‘Redintegration”’ is some- 
thing old, something new. The name is adapted from the British logi- 
cians. The event which it describes is a very apparent statement of 
the essence of a psychological pattern, that a present detail (A) may in- 
stigate an entire consequent (X YZ) because of its former membership 
in an antecedent (ABC) which formerly instigated the consequent 
(XYZ). It will be seen that the redintegrative type of behavior distin- 
guishes that which is commonly called mental from that called physical. 
It is not, however, an “all or none”’ distinction, but a continuum. 
Events are more those of “‘mind”’ as they show greater redintegrative 
characteristics. There is no definite line of demarcation. 

Dr. Hollingworth, having defined his field as relating to “‘cue’’ 
behavior, stays within it. The book is notable in that it contains no 
chapters on physiology or neurology, and makes little attempt to 
bring out the usual mental-neural correlations. The ultimate value 
of describing conduct in neural terms is admitted, but its use in our 
present state of knowledge is seriously questioned. 
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All of the other traditional content of a system of psychology is 
present. Perception, the role of language, images, meaning, learning, 
emotions, reasoning, intelligence, are not merely “treated,’’ but 
coordinated in the account of mental behavior. 

Dr. Hollingworth has taken some pains to make a usable textbook 
as well as a contribution to theory. Illustrations and examples are 
frequent. Appendices of problems, questions and exercises, and of 
well selected readings should render valuable assistance to the student 
and instructor. The suggested classifications of instincts and of 
emotions, little considered in the text proper, are given in tabular form 
in another appendix. That this psychology is a unique account does 
not seem to be a serious bar to its use as a textbook. It is no more 
one-sided than most of the “‘associationist’’ books in wide use. It is 
different, but differs in many parts chiefly in its superior merit. 

Students of educational psychology will be especially interested in 
the discussion of learning. Adequate space is given to this topic in a 
formal way, and much of the rest of the book revolves about it. An 
entire chapter is devoted to summaries of experimental studies in this 
field. The troublesome problem of the ‘backward action” of the 
so-called law of effect is dispatched with some nicety. An excellent 
account is given of the nature of the first performance of a task, of 
“problem solving.” 

The reader may feel a certain dissatisfaction with the theoretical 
basis of learning being limited to “the reduction of the stimulus to a 
more slight and subtle cue, which instigates the response.’”’ Another 
factor, which Dr. Hollingworth does not overtly recognize, is certainly 
there—‘“‘the growth in recognition of more and more details as signifi- 
cant and having a bearing on the situation.”’ This he considers as a 
part of intelligence, but does not adequately apply to learning. 

Another possible weakness is in motivation. Only ‘a persisting 
and annoying situation” is regarded as a motive. The drive of 
satisfaction, the urge to the carrying on of an activity which is asso- 
ciated with the success of that activity—a salient fact found in every 
classroom—is ignored. 

All in all, however, Professor Hollingworth’s contribution seems 
to be the most notable attempt at a systematic restatement of psy- 
chology in some time. It certainly merits the careful inspection of 
every teacher and advanced student of psychology. 

LAURANCE F. SHAFFER. 


The Lincoln School of Teachers College. 
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AN ADAPTATION OF THE PRINCIPLE OF REDINTEGRATION 


Psychology: The Science of Mental Activity, by Frederick H. Lund. 
New York: A. G. Seiler, 1927. Pp. XX + 488. 


Lund’s ‘‘Psychology”’ is more than just another text. It is, as 
far as the reviewer is aware, the first attempt to adapt the principle of 
redintegration as advocated by Hollingworth with the reaction (S — 
R) hypothesis as interpreted by Woodworth. The concept of redin- 
tegration is, however, not introduced till the second half of the text 
because the author thinks it specially applicable “‘to the higher func- 
tions and to the more strictly mental (7.e., more variable and less overt) 
activities.” The chapter headings are largely in accord with respect- 
able tradition, the only exception being the addition of a chapter on 
belief and confidence—topics which the author has investigated. The 
thought activities and the physiological foundations of behavior are 
here treated more thoroughly and comprehensively than is customary. 
A feature which surprised the reviewer is the teleological classification 
of instincts. These are classified under two large captions: (1) 
Self Maintaining, and (2) Race Maintaining. A classification of 
original tendencies in terms of “ends” or “‘ purposes” is more to be 
expected in the writings of moralists, sociologists and psychopathol- 
ogists, rather than the writings of the scientific psychologist. 

The book does not contain the leisurely illustrative materials and 
other kindly breaks found in Woodworth. It does not represent as an 
original an organization of subject-matter as does Perrin and Klein. 
But the product proves the author to be a competent organizer and 
fairly lucid expositor of the known facts of psychology, and, in the 
opinion of Hollingworth who writes the Introduction in this volume, 
it succeeds in its endeavor ‘“‘to present an impartial, systematic, and 
yet concrete account of the human mind for beginners.”’ 


H. MELTZER. 
Psychiatric Clinic, St. Louis, Mo. 





INTRODUCTIONS TO PsYCHOLOGY 


An Elementary Psychology, by D. E. Phillips. Boston: Ginn and 
Company, 1927. Pp. V + 420. 

An Introduction to Psychology, by J. J. B. Morgan and A. R. Gilliland. 
New York: The.Macmillan Company, 1927. Pp. IX + 319. 


These two books are attempts to simplify psychology and make it 
comprehensible to immature students. To make scientific knowledge 
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more comprehensible to more and more people without loss of accuracy 
or the suggesting of false implications is no easy task. Any attempt 
to popularize a science is a worthy venture but it involves risk. Psy- 
chology is no exception. At best simplification results in humani- 
zation of knowledge; at worst, it is vulgarization. In the opinion of 
the reviewer neither of these two products is poor enough to be called 
vulgarized nor lucid and accurate enough to be called humanized. In 
their technique of simplification, as well as selection and organization 
of materials and point of view, the books differ greatly. 

The book of Phillips is the more literary, less scientific and cautious 
of the two. In spots it is characterized by strong and clear flow of 
thought. In his efforts to lure the student on and fill him with enthusi- 
asm; “‘to stimulate interest by giving a wide view of science, even to 
the extent of suggesting hidden mysteries and unanswerable ques- 
tions,’’ the author frequently uses words which are famous for their 
power to confuse and frustrate intelligent discussion. He goes to liter- 
ature for illustrations and for facts. He quotes Goethe and Ruskin 
on feelings. His treatment, he says is simple and unpretentious. 
He gives neither time nor space to a succinct description of experi- 
mental work on emotions, but space enough to say a few words con- 
cerning the conceptions of a host of writers including Drummond, 
Ward, Bain, Bentham, Spencer, Herbart, Shand! Subjects included 
in this revised edition of the book, not included in the original, are: 
Psychology in Daily Life, Mental Hygiene, Organizing Our Personality, 
Heredity and Environment, The Group Mind, History of Psychology, 
and Psychology of Literature. The book has a high moral tone. Its 
“‘goal is enlightened feeling directed to healthy sane action.”’ This is 
praiseworthy, indeed, but—is it science? Specifically is it psychology, 
and does it teach the student to psychologize rather than rationalize 
about human nature and behavior? 

The book by Morgan and Gilliland is designed for use in high school. 
Their argument is that ‘‘the majority of high school students do not 
attend college; and this means that only a small percentage of the 
population knows anything about a science which is of great importance 
to everyone.”’ These authors, too, emphasize the importance of 
‘control, effort and the moral personality, in the hope that the study 
of psychology may be an encouragement to the younger student to 
allow his ideas to grow along healthy, rational and optimistic channels.”’ 
But the tone smacks less of sermonizing and more of popularized 
mental hygiene. The style is less poetical, more prosaic. As a 
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whole, the book may be described as a short, somewhat simplified 
treatment of the materials contained under traditional chapter head- 
ings presented in a traditional academic manner plus a chapter on 
“Effort.”” And the materials selected do, as the authors promise in 
Preface, deal only with “that side of psychology which through 
experiment and observation, has a definite scientific basis for its 
various conclusions.”’ The style generally is clear, straightforward 
and fairly fluent. Such features as the forgetting of the unpleasant 
with the exhortation not to forget the unpleasant but face reality in 
the chapter on memory, the excellent chapter on ‘‘Sleep and Dreams”’ 
which shows a first hand knowledge of the materials treated will 
remind many readers that one of the authors wrote ‘“‘The Psychology 
of the Unadjusted School Child.”’ H. MELTZER. 
Psychiatric Clinic, St. Louis. 





A CRITIQUE oF “PsycHOLoGy’”’ 


Psychology as Science: Its Problems and Points of View, By H. P. Weld. 
New York: Henry Holt and Company, 1928. Pp. XI + 297. 


In “‘ Psychology As Science”’ the author cites many facts of general 
psychology and the special psychologies. The facts, though, are 
mentioned and critically considered, not systematically presented and 
explained. That is to say, this volume is not intended to be and is not 
a text in any aspect of psychology. It is intended for use as a text in 
a second course and in mimeographed form has been so used by the 
author in Cornell University for the past two years. It is a critical 
inquiry of the “logical approaches and fundamental problems of 
psychology”’ extended over the entire range of the field. Stated in 
general terms, the major questions considered are: What is the meaning 
of science—the critieal conception contrasted to the traditional? In 
the light of that meaning how can the fundamental problems, basic 
concepts and definitions of psychology be interpreted? With what 
validity and accuracy from the point of view of psychology as science? 
With what implications from the point of view of technology? What 
light does a study of origins and development throw on the standards 
of reference used in the two aspects of psychology differentiated by 
the author: (a) Empirical psychology whose concern is with theory 
of conduct. (6) Existential psychology which treats mind as the 
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whole world of experience and satisfies the critical conception of 
science as herein considered? 

Space limitation make a comprehensive description and evaluation 
of this contribution prohibitory. Briefly, this can be said of it: It is a 
scholarly work written by an intelligent man and understanding 
student who tries to be scrupulously fair and avoid controversy but 
whose thinking has been so largely influenced by the teachings and 
writings of Titchener that his point of view is now more in accord 
with Bentley than any other single psychologist. The interesting 
quotations which precede each chapter, the happy selection of words 
and the precise and fluent style, help make the materials delightfully 
intelligible. H. ME.rzer. 

Psychiatric Clinic, St. Louis. 





CURRICULUM-MAKING FOR TEACHER-TRAINING CLASSES 


The Technique of Curriculum-making, by Henry Harap. New York: 
The Macmillan Company, 1928. Pp. XI + 315. 


A course in the technique of curriculum-making arranged for 
about 60 meetings of a teacher-training class furnishes the motive of 
this book. The aim of the course is to help the student in the making, 
revision, or evaluation of a course of study, and the intelligent inter- 
pretation of other curriculum revisions now in progress. The student 
is familiarized with the steps actually taken in making or revising 
curricula. Each step is analyzed and discussed; reference materials 
bearing upon the step under discussion are assembled for study in a 
convenient form; and problems are suggested to give the pupil practice 
in independent work. This is a work book as well as an assembly of 
the viewpoints and materials concerning the technique of curriculum- 
making. : 

The book is organized into 11 parts, Parts X ‘and XI being exten- 
sive bibliographies dealing with curriculum-making and curricular 
investigations. Part I clears the ground for curriculum-making by 
defining terms. Parts II to V deal with the determination of objectives 
in curriculum-making, their organization and a thoughtful critique 
of five of the present methods of determining curriculum objectives. 
Parts VI and VII examine the nature and the manner of composing 
teaching units. Part VIII describes final steps in curriculum-making 
such as the formulation of tests and practice materials, adaptation to 


indit 
revie 
of it: 


assel 
trail 
an & 
the : 
eval 
pred 
cour 
oriel 
tion 
stim 
field 


auth 
the 

emp 
upo 
chal 
resp 
mak 








of 


ig 
1t 
id 


1g 
1s 


ly 


New Publications 285 


individual differences, and curriculum revision. Part IX concisely 
reviews the chief steps in curriculum-making and gives a brief critique 
of its present status. 

This book is a decided contribution to an important field. The 
assembly of materials and references will be welcomed by teacher- 
training classes and committees at work on the curriculum. It gives 
an all-round and dispassionate survey of the predominant techniques in 
the field of curriculum-making at present, adding frequent thoughtful 
evaluations. It steers a careful course between the Scylla of a too- 
predominant reliance upon child-interest and the Charybdis of formal 
courses of study. The average classroom teacher will find here an 
orientation point from which to direct her own activities and contribu- 
tions toward curriculum reconstruction. It should furnish a real 
stimulus to more widespread and more effective participation in the 
field of curriculum-making. 

The division into ‘lessons’? seems somewhat academic, but as the 
author points out, the material may be used in any way desired. In 
the opinion of this reviewer, the discussions place rather too much 
emphasis upon the “‘school”’ aspect of activities and not quite enough 
upon first-hand observation of the natural activities, interests and 
characteristics of children in real life situations. However, in this 
respect, it is considerably in advance of many discussions of curriculum 
making. ANN SHUMAKER. 

Graduate Student, Teachers College. 


ON THE RELATIONSHIP OF RATE OF WoRK AND ABILITY 


A Statistical Study of Certain Aspects of the Time Factor in Intelligence, 
by Fred C. Walters. New York City. Teachers College, Colum- 
bia University, Contributions to Education, No. 248, Bureau of 
Publications, 1927. Pp. VIII + 82. 


At one time the factor of speed was given more weight in the testing 
movement than it now has. The accumulation of evidence from many 
studies tends to show that the relationship between rate of work and 
ability is so low that it is unsafe to predict one from the other. It is 
a contribution to this general question and some of its implications 
which the author of this study has made. 

Two criteria were selected against which intelligence test results for 
half, standard, and extended time could be correlated. The first 
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criterion was the Stanford Revision of the Binet Tests and the other 
composite taking into account school marks, progress, achievement 
and teachers’ ratings as well as ability. The general conclusion 
reached on the basis of five well known tests is that a better prediction 
can be made from scores which represent the ability of the individual 
when he has time to finish all the items of which he is capable. 

By use of a skillful technique the author computes the amount of 
contribution which the speed factor makes to the test scores under 
standard time limits. The contribution of the rate of reading to these 
scores is also made a point of special study. 

The author is to be commended for the insight he has shown in 
seeing the implications involved in his problem and for the care he has 
used in developing his criteria and in analyzing and interpreting his 
data. C. O. MATHEWs. 

Ohio Wesleyan University. 





A New IpgEa IN INTELLIGENCE TESTS 


A Study in Disguised Intelligence Tests, Interview Form, by Donald 
Scott Snedden, Ph. D. New York City: Teachers College, 
Columbia University, 1927. Pp. 48. 


Innumerable books and articles have been written upon ‘“‘the art of 
the interview,” each taking up the problem from a different standpoint 
yet each equally inadequate in that the estimate of an individual’s 
ability rested upon subjective judgment. 

It is for this situation where a rating of intelligence is desired 
without cognizance of the subject that the author has worked out the 
ingenious “disguised intelligence test.”” Realizing that the deficiencies 
of previous methods in this field result largely from the use of personal 
evaluations, the test was made completely objective and given under 
standardized conditions. 

Obviously the very nature of the interview necessitates many 
limitations. Not only must the test be short but it must be plausible 
enough to allay any suspicions regarding its purpose. Such a test was 
made up based on vocabulary and, masked as a questionnaire about 
parents, was administered to a selected group of 113 children. Each 
word used in the questions was carefully selected and rated as to 
difficulty according to the study of ‘‘Social Ethical Vocabulary” 
made by Dr. Gladys Schwesinger in 1925. 
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The results of ‘“‘the disguised intelligence test’’ when correlated 
with the criterion showed these coefficients of correlation: .8243 with 
the Army Alpha Form 9; .8162 with the Otis Higher Self-administering 
Test of Mental Ability; and .8024 with the Terman Group Test. 

Occasions seldom arise where it is impossible to administer a 
general intelligence test when a measure of ability is desired. The 
value of this study lies in suggesting a method for these infrequent 
occasions. Marcia E. MENDENHALL. 





STANDARD TESTS FOR THE ELEMENTARY AND THE SECONDARY SCHOOLS 


How to Measure, by Guy M. Wilson and Kremer J. Hoke. New York: 
The Macmillan Company, 1928. Pp. XXVI + 597. 


The measurement movement has been marked by a great increase 
in the number of standard tests and of books on the subject. There 
has come a cry for better tests rather than more tests; consequently, 
tests were evaluated in terms of their predictive power and of the 
worth of their content. In the field of textbooks for measurement 
courses, there has been a deluge of books which are largely duplications 
of each other in content. No distinct contribution by way of ingenuity 
of using tests is made. Moreover, all too many of these books have 
been mere assemblies of lists of available tests. It is indeed unfor- 
tunate that a great number of such books are ‘‘encyclopedic”’ in their 
nature, leaving to the over-burdened classroom teacher the tasks of 
selecting tests and of devising her own methods of treating and using 
data. A book in the measurement field which is to reach teachers 
must present its subject-matter in a clear and usable form. One 
somehow feels that the textbook writers should get into closer touch 
with the child in the classroom for an extended period of time. 

In this book the authors have advanced very sound theses in the 
preface: ‘‘ First, that the work in measurement should be handled more 
and more by the individual classroom teacher; and second, that the 
chief purpose to be served by standard tests is the diagnosis of pupil 
ability and pupil difficulties.’ To this end they have presented a 
very complete list and description of the tests available in the many 
subjects of both the elementary and secondary schools. Fifty pages 
have been devoted to the Criteria of a Standardized Test, Informal 


Tests and the New Type Examination, Statistical Terms and 
Procedures. 
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It is to be regretted that the authors have largely confined them- 
selves to a descriptive rather than a critical method in their considera- 
tion of the many tests. It is somewhat difficult for the classroom 
teacher to select from a large number of tests which have been described 
but not evaluated. This evaluation is certainly possible, for the worth- 
while tests in education have been submitted to experimental 
validation. 

On the whole the book covers very adequately the tests now avail- 
able, and makes a contribution in such fields as measurement in art 
education, in musical talent and mechanics, and in history and civics. 
It should prove a rather valuable reference book for obtaining informa- 
tion as to the tests which may be obtained to date. 


JAMES E. MENDENHALL. 
The Lincoln School. 
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